date:20170418

What's cooking in git.git (Apr 2017, #03; Tue, 18)

2017-04-18 Thread Junio C Hamano

Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.  The ones marked with '.' do not appear in any of
the integration branches, but I am still holding onto them.

You can find the changes described here in the integration branches
of the repositories listed at

http://git-blame.blogspot.com/p/git-public-repositories.html

--
[New Topics]

* df/dir-iter-remove-subtree (2017-04-17) 5 commits
 - remove_subtree(): reimplement using iterators
 - dir_iterator: refactor state machine model
 - dir_iterator: add helpers to dir_iterator_advance
 - remove_subtree(): test removing nested directories
 - dir_iterator: add tests for dir_iterator API

 Update the dir-iterator API and use it to reimplement
 remove_subtree().

 Waiting for response to review.
 GSoC microproject.


* jk/ls-files-recurse-submodules-fix (2017-04-18) 2 commits
 - ls-files: fix path used when recursing into submodules
 - ls-files: fix recurse-submodules with nested submodules

 "ls-files --recurse-submodules" did not quite work well in a
 project with nested submodules.

 Will merge to 'next'.


* jt/fetch-pack-error-reporting (2017-04-17) 1 commit
 - fetch-pack: show clearer error message upon ERR

 "git fetch-pack" was not prepared to accept ERR packet that the
 upload-pack can send with a human-readable error message.  It
 showed the packet contents with ERR prefix, so there was no data
 loss, but it was redundant to say "ERR" in an error message.

 Will merge to 'next'.


* nd/conditional-config-in-early-config (2017-04-17) 3 commits
 - config: correct file reading order in read_early_config()
 - config: handle conditional include when $GIT_DIR is not set up
 - config: prepare to pass more info in git_config_with_options()

 The recently introduced conditional inclusion of configuration did
 not work well when early-config mechanism was involved.

 Will merge to 'next'.


* sb/checkout-recurse-submodules (2017-04-17) 1 commit
 - submodule: remove a superfluous second check for the "new" variable

 Code cleanup.

 Will merge to 'next'.


* xy/format-patch-base (2017-04-17) 1 commit
 - doc: trivial typo in git-format-patch.txt

 Doc cleanup.

 Will merge to 'next'.


* jk/snprintf-cleanups (2017-04-17) 1 commit
 - replace: plug a memory leak

 Hotfix for a topic that is already in 'master'.

 Will merge to 'next'.


* ab/clone-no-tags (2017-04-18) 1 commit
 - clone: add a --no-tags option to clone without tags

 "git clone" learned the "--no-tags" option not to fetch all tags
 initially, and also set up the tagopt not to follow any tags in
 subsequent fetches.


* ab/completion-push-delete-ref (2017-04-18) 1 commit
 - completion: expand "push --delete  " for refs on that 

 The completion script (in contrib/) learned to complete "git push
 --delete b" to complete branch name to be deleted.


* bw/forking-and-threading (2017-04-18) 11 commits
 - run-command: block signals between fork and execve
 - run-command: add note about forking and threading
 - run-command: handle dup2 and close errors in child
 - run-command: eliminate calls to error handling functions in child
 - run-command: don't die in child when duping /dev/null
 - run-command: prepare child environment before forking
 - string-list: add string_list_remove function
 - run-command: use the async-signal-safe execv instead of execvp
 - run-command: prepare command before forking
 - t0061: run_command executes scripts without a #! line
 - t5550: use write_script to generate post-update hook


* sb/reset-recurse-submodules (2017-04-18) 4 commits
 - builtin/reset: add --recurse-submodules switch
 - submodule.c: submodule_move_head works with broken submodules
 - submodule.c: uninitialized submodules are ignored in recursive commands
 - entry.c: submodule recursing: respect force flag correctly

 "git reset" learned "--recurse-submodules" option.

--
[Stalled]

* sg/clone-refspec-from-command-line-config (2017-02-27) 1 commit
 - clone: respect configured fetch respecs during initial fetch

 Expecting a reroll.
 cf. <20170227211217.73gydlxb2qu2s...@sigill.intra.peff.net>
 cf.

Re: [PATCH v5 02/11] t0061: run_command executes scripts without a #! line

2017-04-18 Thread Johannes Sixt


Am 19.04.2017 um 01:17 schrieb Brandon Williams:

Add a test to 't0061-run-command.sh' to ensure that run_command can
continue to execute scripts which don't include a '#!' line.


Why is this necessary? I am pretty certain that our emulation layer on 
Windows can only run scripts with a shbang line.


-- Hannes

Re: [PATCH] clone: add a --no-tags option to clone without tags

2017-04-18 Thread Junio C Hamano

Junio C Hamano  writes:

> Ævar Arnfjörð Bjarmason   writes:
>
>> Add a --no-tags option to "git clone" to clone without tags. Currently
>> there's no easy way to clone a repository and end up with just a
>> "master" branch via --single-branch, or track all branches and no
>> tags. Now --no-tags can be added to "git clone" with or without
>> --single-branch to clone a repository without tags.
>
> Makes sense.
>
>> +--no-tags::
>> +Don't clone any tags, and set `remote.origin.tagOpt=--no-tags`
>> +in the config, ensuring that future `git pull` and `git fetch`
>> +operations won't fetch any tags.
>
> OK.  Not just we ignore tags during the initial cloning, we set
> things up so that we do not _follow_ tags in subsequent fetches.

I somewhat doubt the utility of this change.  "--single-branch"
already refrains from grabbing all the tags, and the tags it grabs
when "clone" runs and also in subsequent "fetch" are only the ones
relevant to that branch.  When a user is fetching say 'maint', it is
very likely that the user wants tags that are reachable from the tip
of 'maint' (if only to make the tip of that branch describable),
even though the user would not care about the tags on the other
branches that are ahead of 'maint'.

It is not that much code, and carrying it is not that much burden,
but I am reasonably sure that I won't use it myself.

Re: [PATCHv2 4/4] builtin/reset: add --recurse-submodules switch

2017-04-18 Thread Junio C Hamano

Stefan Beller  writes:

> git-reset is yet another working tree manipulator, which should
> be taught about submodules.
>
> One use case of "git-reset" is to reset to a known good state,
> and dropping commits that did not work as expected.
>
> In that case one of the expected outcomes from a hard reset
> would be to have broken submodules reset to a known good
> state as well.  A test for this was added in a prior patch.

When "git reset --hard" at the top-level superproject updates a
gitlink in the index to a commit that was different from what was
checked out in the working tree of the submodule, what should
happen?  Do we reset the tip of the current branch in the submodule
to point at the commit the index of the top-level records?  Do we
detach the HEAD in the submodule to point at the commit?  Something
else that is configurable?  Or do we just run "git reset --hard"
in each submodule (which may leave submodule's HEAD different from
what is recorded in the index of the superproject)?

"... to a known good state as well" does not help answering the above.

Re: [PATCH v2] xgethostname: handle long hostnames

2017-04-18 Thread Junio C Hamano

David Turner  writes:

> If the writer has the smaller HOST_NAME_MAX, this will work fine.  If the 
> reader
> has the smaller HOST_NAME_MAX, and the writer's actual value is too long,
> then there's no way the strcmp would succeed anyway.  So I don't think we need
> to worry about it.

Hmph, I have to agree with that reasoning, only because the value we
read into locking_host[] is not used for error reporting at all.  I
would have insisted to read what is on the filesystem anyway if that
were not the case.

Thanks.

Re: [PATCH] completion: expand "push --delete " for refs on that

2017-04-18 Thread Junio C Hamano

Ævar Arnfjörð Bjarmason   writes:

> Change the completion of "push --delete  " to complete
> refs on that , not all refs. Before this e.g. cloning git.git
> and doing "git push --delete origin p" will complete nothing,
> whereas origin/p will uselessly complete origin/pu.
>
> Now p will complete as "pu". The completion of giving --delete
> later, e.g. "git push origin --delete p" remains unchanged, this
> is a bug, but is a general existing limitation of the bash completion,
> and not how git-push is documented, so I'm not fixing that case.
>
> I looked over t9902-completion.sh but couldn't quickly find out how to
> add a test for this, but all the existing tests pass, and all my
> manual testing of "git push --delete  ..." does the right
> thing now.
>
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---

This looks like a sensible thing to want to add.  Perhaps somebody
more familiar with the completion tests can help with t/ and also
give us general comments.  Until then, let me queue it as is on
'pu'.

Thanks.

>  contrib/completion/git-completion.bash | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/contrib/completion/git-completion.bash 
> b/contrib/completion/git-completion.bash
> index 1150164d5c..2e5b3ed776 100644
> --- a/contrib/completion/git-completion.bash
> +++ b/contrib/completion/git-completion.bash
> @@ -701,7 +701,7 @@ __git_complete_revlist ()
>  __git_complete_remote_or_refspec ()
>  {
>   local cur_="$cur" cmd="${words[1]}"
> - local i c=2 remote="" pfx="" lhs=1 no_complete_refspec=0
> + local i c=2 remote="" pfx="" lhs=1 no_complete_refspec=0 delete=0
>   if [ "$cmd" = "remote" ]; then
>   ((c++))
>   fi
> @@ -709,6 +709,7 @@ __git_complete_remote_or_refspec ()
>   i="${words[c]}"
>   case "$i" in
>   --mirror) [ "$cmd" = "push" ] && no_complete_refspec=1 ;;
> + --delete) delete=1 ;;
>   --all)
>   case "$cmd" in
>   push) no_complete_refspec=1 ;;
> @@ -761,7 +762,9 @@ __git_complete_remote_or_refspec ()
>   fi
>   ;;
>   push)
> - if [ $lhs = 1 ]; then
> + if [ $delete = 1 ]; then
> + __git_complete_refs --remote="$remote" --pfx="$pfx" 
> --cur="$cur_"
> + elif [ $lhs = 1 ]; then
>   __git_complete_refs --pfx="$pfx" --cur="$cur_"
>   else
>   __git_complete_refs --remote="$remote" --pfx="$pfx" 
> --cur="$cur_"

Re: [PATCH v4 0/3] rebase: --signoff support

2017-04-18 Thread Junio C Hamano

Giuseppe Bilotta  writes:

> Some work about extending --signoff support to interactive rebases is
> underway in the `rebase-signoff-ext` branch, but there's a lot of
> corner cases to test and work-out, so I guess that'll be fore some
> other time.

Yup, that is fine.

Will queue.  Thanks.

Re: [PATCH v1] diffcore-rename: speed up register_rename_src

2017-04-18 Thread Jeff King

On Tue, Apr 18, 2017 at 10:56:08PM -0400, Jeff King wrote:

> > When adding many things, we often just append and then sort at the
> > end after we finished adding.  I wonder if recent "check the last
> > one and append" optimization beats that strategy.
> 
> The big question is whether we need to detect duplicates while we're
> appending to the list, which is hard on an unsorted list.  In this
> function, at least, we do detect when the path already exists and return
> the existing entry. I'm not sure under what circumstances we would see
> such a duplicate, though, as each filename should appear only once in
> the tree diff. I would think.
> 
> Doing:
> 
> diff --git a/diffcore-rename.c b/diffcore-rename.c
> index f7444c86b..56a493d97 100644
> --- a/diffcore-rename.c
> +++ b/diffcore-rename.c
> @@ -86,7 +86,7 @@ static struct diff_rename_src *register_rename_src(struct 
> diff_filepair *p)
>   struct diff_rename_src *src = &(rename_src[next]);
>   int cmp = strcmp(one->path, src->p->one->path);
>   if (!cmp)
> - return src;
> + die("BUG: duplicate rename src: %s", one->path);
>   if (cmp < 0) {
>   last = next;
>   continue;
> 
> passes the test suite, at least. :)

Maybe relevant: 4d6be03b9 (diffcore-rename: avoid processing duplicate
destinations, 2015-02-26). That's on the dst side, but possibly we
should do something similar on the src side.

BTW, I think the return value from register_rename_src() is
questionable. It points to a "struct diff_rename_src" that may be
reallocated by further calls to the function. Fortunately nobody
actually looks at it, let alone saves it, so there's no bug.

We may want to convert that return value to a void (if not just return
an int for "hey, there's a duplicate", like we do for add_rename_dst()).

Also, presumably that function could learn the same "check the last one"
trick that the src side does. Which leads me back to "surely we can
generalize this". I don't think bsearch() is quite what we want, because
its interface doesn't tell us where to put the item when it isn't found.
But I think we could make a general bsearch-like function that has
similar semantics to index_name_pos(), with its negative-integer return.

And then that would be a general lookup function, and we could easily
build a general "look up and add" function around that. And the "check
the last one" optimization would go in the latter.

-Peff

Re: [PATCH v3 1/2] use HOST_NAME_MAX to size buffers for gethostname(2)

2017-04-18 Thread Junio C Hamano

Jonathan Nieder  writes:

>> @@ -274,7 +278,7 @@ static const char *lock_repo_for_gc(int force, pid_t* 
>> ret_pid)
>>   * running.
>>   */
>>  time(NULL) - st.st_mtime <= 12 * 3600 &&
>> -fscanf(fp, "%"SCNuMAX" %127c", , locking_host) == 2 
>> &&
>> +fscanf(fp, scan_fmt, , locking_host) == 2 &&
>
> I hoped this could be simplified since HOST_NAME_MAX is a numeric literal,
> using the double-expansion trick:
>
> #define STR_(s) # s
> #define STR(s) STR_(s)
>
>   fscanf(fp, "%" SCNuMAX " %" STR(HOST_NAME_MAX) "c",
>  , locking_host);
>
> Unfortunately, I don't think there's anything stopping a platform from
> defining
>
>   #define HOST_NAME_MAX 0x100
>
> which would break that.

Yes, that was exactly why I went to the xstrfmt() route when I sent
mine yesterday ;-).

> So this run-time calculation appears to be necessary.
>
> Reviewed-by: Jonathan Nieder 

Thanks.

Re: [PATCH v1] diffcore-rename: speed up register_rename_src

2017-04-18 Thread Jeff King

On Tue, Apr 18, 2017 at 07:45:05PM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > On Tue, Apr 18, 2017 at 07:44:21PM +, g...@jeffhostetler.com wrote:
> >
> >> From: Jeff Hostetler 
> >> 
> >> Teach register_rename_src() to see if new file pair
> >> can simply be appended to the rename_src[] array before
> >> performing the binary search to find the proper insertion
> >> point.
> >
> > I guess your perf results show some minor improvement. But I suspect
> > this is because your synthetic repo does not resemble the real world
> > very much. You're saving a few strcmps, but for each of those files
> > you're potentially going to have actually zlib inflate the object
> > contents and do similarity analysis.
> >
> > So "absurd number of files doing 100% exact renames" is the absolute
> > best case, and it saves a few percent.
> >
> > I dunno. It is not that much code _here_, but I'm not excited about the
> > prospect of sprinkling this same "check the last one" optimization all
> > over the code base. I wonder if there's some way to generalize it.
> 
> When adding many things, we often just append and then sort at the
> end after we finished adding.  I wonder if recent "check the last
> one and append" optimization beats that strategy.

The big question is whether we need to detect duplicates while we're
appending to the list, which is hard on an unsorted list.  In this
function, at least, we do detect when the path already exists and return
the existing entry. I'm not sure under what circumstances we would see
such a duplicate, though, as each filename should appear only once in
the tree diff. I would think.

Doing:

diff --git a/diffcore-rename.c b/diffcore-rename.c
index f7444c86b..56a493d97 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -86,7 +86,7 @@ static struct diff_rename_src *register_rename_src(struct 
diff_filepair *p)
struct diff_rename_src *src = &(rename_src[next]);
int cmp = strcmp(one->path, src->p->one->path);
if (!cmp)
-   return src;
+   die("BUG: duplicate rename src: %s", one->path);
if (cmp < 0) {
last = next;
continue;

passes the test suite, at least. :)

-Peff

Re: [PATCH v3 2/2] xgethostname: handle long hostnames

2017-04-18 Thread Junio C Hamano

Jonathan Nieder  writes:

> Hi,
>
> David Turner wrote:
>
>> If the full hostname doesn't fit in the buffer supplied to
>> gethostname, POSIX does not specify whether the buffer will be
>> null-terminated, so to be safe, we should do it ourselves.  Introduce
>> new function, xgethostname, which ensures that there is always a \0
>> at the end of the buffer.
>
> I think we should detect the error instead of truncating the hostname.
> That (on top of your patch) would look like the following.
>
> Thoughts?
> Jonathan
>
> diff --git i/wrapper.c w/wrapper.c
> index d837417709..e218bd3bef 100644
> --- i/wrapper.c
> +++ w/wrapper.c
> @@ -660,11 +660,13 @@ int xgethostname(char *buf, size_t len)
>  {
>   /*
>* If the full hostname doesn't fit in buf, POSIX does not
> -  * specify whether the buffer will be null-terminated, so to
> -  * be safe, do it ourselves.
> +  * guarantee that an error will be returned. Check for ourselves
> +  * to be safe.
>*/
>   int ret = gethostname(buf, len);
> - if (!ret)
> - buf[len - 1] = 0;
> + if (!ret && !memchr(buf, 0, len)) {
> + errno = ENAMETOOLONG;
> + return -1;
> + }

H.  "Does not specify if the buffer will be NUL-terminated"
would mean that it is OK for the platform gethostname() to stuff
sizeof(buf)-1 first bytes of the hostname in the buffer and then
truncate by placing '\0' at the end of the buf, and we would not
notice truncation with the above change on such a platform, no?

Re: [PATCH] various: disallow --no-no-OPT for --no-opt options

2017-04-18 Thread Jeff King

On Tue, Apr 18, 2017 at 07:40:37PM -0700, Junio C Hamano wrote:

> > It might even be possible to detect the existing line and
> > have parse-options automatically respect "--foo" when "--no-foo" is
> > present.  But that may run afoul of callers that add both "--foo" and
> > "--no-foo" manually.
> 
> True but wouldn't that something we would want to avoid anyway?
> That is, "git cmd [--OPT | --no-OPT | --no-no-OPT]" from the end
> user's point of view should be an error because it is unclear what
> difference there are between --OPT and --no-no-OPT.  And we should
> be able to add a rule to parse_options_check() to catch such an
> error.

I meant that if you have something like this in your options array:

  { 0, "foo", OPTION_INTEGER, , 1 },
  { 0, "no-foo", OPTION_INTEGER, , 2 },

that if we start magically treating "--no-foo" magically, it will
conflict with "--foo" (in this case that's maybe OK because --foo comes
first, but as a general rule it's dangerous to existing options arrays).

> Having said that, I am not sure if we want to go the route of
> "existing line that begins with 'no-' behaves magical".  For
> boolean, I suspect we may be get away with such a magic without
> confusing ourselves too much, though.

Yeah, at which point we might as well ask callers to explicitly ask for
the behavior with OPT_NEGBOOL.

-Peff

Re: [PATCH v3 2/2] xgethostname: handle long hostnames

2017-04-18 Thread Junio C Hamano

David Turner  writes:

> If the full hostname doesn't fit in the buffer supplied to
> gethostname, POSIX does not specify whether the buffer will be
> null-terminated, so to be safe, we should do it ourselves.  Introduce

The name of the character whose ASCII value is '\0' is NUL, not
null (similarly for in-code comment).

Re: [PATCH v1] diffcore-rename: speed up register_rename_src

2017-04-18 Thread Junio C Hamano

Jeff King  writes:

> On Tue, Apr 18, 2017 at 07:44:21PM +, g...@jeffhostetler.com wrote:
>
>> From: Jeff Hostetler 
>> 
>> Teach register_rename_src() to see if new file pair
>> can simply be appended to the rename_src[] array before
>> performing the binary search to find the proper insertion
>> point.
>
> I guess your perf results show some minor improvement. But I suspect
> this is because your synthetic repo does not resemble the real world
> very much. You're saving a few strcmps, but for each of those files
> you're potentially going to have actually zlib inflate the object
> contents and do similarity analysis.
>
> So "absurd number of files doing 100% exact renames" is the absolute
> best case, and it saves a few percent.
>
> I dunno. It is not that much code _here_, but I'm not excited about the
> prospect of sprinkling this same "check the last one" optimization all
> over the code base. I wonder if there's some way to generalize it.

When adding many things, we often just append and then sort at the
end after we finished adding.  I wonder if recent "check the last
one and append" optimization beats that strategy.

Re: [PATCH] various: disallow --no-no-OPT for --no-opt options

2017-04-18 Thread Junio C Hamano

Jeff King  writes:

> On Wed, Apr 19, 2017 at 12:29:18AM +0200, René Scharfe wrote:
> ...
>> PARSE_OPT_NONEG should only be used for options where a negation doesn't
>> make sense, e.g. for the --stage option of checkout-index.
>
> I think we do strive to avoid "--no-no-foo", and instead have "--no-foo"
> and "--foo" to cover both sides.  So for example:
>
>> > -  OPT_BOOL(0, "no-add", >no_add,
>> > +  OPT_BOOL_NONEG(0, "no-add", >no_add,
>> >N_("ignore additions made by the patch")),
>
> This could be more like:
>
>   OPT_NEGBOOL(0, "add", >no_add, ...)
>
> where NEGBOOL would be smart enough to show "--no-add" in the help as
> the primary.

I very much appreciate that this topic to avoid --no-no-OPT
nonsense, but just disabling --no-no-OPT without giving --OPT the
meaning the user who would have used --no-no-OPT wanted does not
sound like a good solution.  Your NEGBOOL looks like a better
approach.

> It might even be possible to detect the existing line and
> have parse-options automatically respect "--foo" when "--no-foo" is
> present.  But that may run afoul of callers that add both "--foo" and
> "--no-foo" manually.

True but wouldn't that something we would want to avoid anyway?
That is, "git cmd [--OPT | --no-OPT | --no-no-OPT]" from the end
user's point of view should be an error because it is unclear what
difference there are between --OPT and --no-no-OPT.  And we should
be able to add a rule to parse_options_check() to catch such an
error.

Having said that, I am not sure if we want to go the route of
"existing line that begins with 'no-' behaves magical".  For
boolean, I suspect we may be get away with such a magic without
confusing ourselves too much, though.

Thanks.

Re: no MERGE_HEAD after octopus merge failure

2017-04-18 Thread Junio C Hamano

Max Ivanov  writes:

> I am using git 2.12.0 and it leaves no MERGE_HEAD once octopus merge
> failed with conflicts. Is it intentional? Files have conflicts markers
> and once resolved `git commit` creates just a normal commit, which is
> very inconvenient and confusing.

I suspect you got these lines in the error message, which you didn't
read:

Automated merge did not work.
Should not be doing an octopus.

Octopus is designed to be done only for simple conflict-less merges,
because it makes later bisection inherently (read: not fault of the
tool, but a natural consequence of the shape of the resulting
history) less efficient.  It might have been better if we chose to
(1) refuse octopus merge if the working tree before "git merge"
starts is not clean, and(2) automatically run "git reset --hard"
when failing an octopus issuing the above error message.  But we
didn't, so you'd need to do "git reset --hard" and merge the tips
one by one, not making an octopus.

Re: [PATCH v2] xgethostname: handle long hostnames

2017-04-18 Thread Junio C Hamano

René Scharfe  writes:

> How important is it to scan the whole file in one call?  We could split
> it up like this and use a strbuf to handle host names of any length.  We
> need to be permissive here to allow machines with different values for
> HOST_NAME_MAX to work with the same file on a network file system, so
> this would have to be the first patch, right?

Absolutely.  FWIW, I agree with Peff that we do not need to use
fscanf here; just reading a line into strbuf and picking pieces
would be sufficient.

> NB: That && cascade has enough meat for a whole function.

True, too.

>
> René
>
> ---
>  builtin/gc.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/builtin/gc.c b/builtin/gc.c
> index 1fca84c19d..d5e880028e 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -251,10 +251,9 @@ static const char *lock_repo_for_gc(int force, pid_t* 
> ret_pid)
>   fd = hold_lock_file_for_update(, pidfile_path,
>  LOCK_DIE_ON_ERROR);
>   if (!force) {
> - static char locking_host[128];
> + static struct strbuf locking_host = STRBUF_INIT;
>   int should_exit;
>   fp = fopen(pidfile_path, "r");
> - memset(locking_host, 0, sizeof(locking_host));
>   should_exit =
>   fp != NULL &&
>   !fstat(fileno(fp), ) &&
> @@ -268,9 +267,10 @@ static const char *lock_repo_for_gc(int force, pid_t* 
> ret_pid)
>* running.
>*/
>   time(NULL) - st.st_mtime <= 12 * 3600 &&
> - fscanf(fp, "%"SCNuMAX" %127c", , locking_host) == 2 
> &&
> + fscanf(fp, "%"SCNuMAX" ", ) == 1 &&
> + !strbuf_getwholeline(_host, fp, '\0') &&
>   /* be gentle to concurrent "gc" on remote hosts */
> - (strcmp(locking_host, my_host) || !kill(pid, 0) || 
> errno == EPERM);
> + (strcmp(locking_host.buf, my_host) || !kill(pid, 0) || 
> errno == EPERM);
>   if (fp != NULL)
>   fclose(fp);
>   if (should_exit) {
> @@ -278,7 +278,7 @@ static const char *lock_repo_for_gc(int force, pid_t* 
> ret_pid)
>   rollback_lock_file();
>   *ret_pid = pid;
>   free(pidfile_path);
> - return locking_host;
> + return locking_host.buf;
>   }
>   }

Re: [PATCH v2] xgethostname: handle long hostnames

2017-04-18 Thread Junio C Hamano

Jeff King  writes:

> I doubt that doing it in one call matters. It's not like stdio promises
> us any atomicity in the first place.
>
>> -fscanf(fp, "%"SCNuMAX" %127c", , locking_host) == 2 
>> &&
>> +fscanf(fp, "%"SCNuMAX" ", ) == 1 &&
>> +!strbuf_getwholeline(_host, fp, '\0') &&
>
> I don't think there is anything wrong with using fscanf here, but it has
> enough pitfalls in general that I don't really like its use as a parser
> (and the general lack of it in Git's code base seems to agree).
>
> I wonder if this should just read a line (or the whole file) into a
> strbuf and parse it there. That would better match our usual style, I
> think.

Yeah, I think it would be a good change.

Re: [PATCH] various: disallow --no-no-OPT for --no-opt options

2017-04-18 Thread Jeff King

On Wed, Apr 19, 2017 at 12:29:18AM +0200, René Scharfe wrote:

> Am 18.04.2017 um 19:09 schrieb Ævar Arnfjörð Bjarmason:
> > Change various --no-OPT options which don't supply PARSE_OPT_NONEG to
> > make --no-no-OPT an error.
> > 
> > All of these worked before this change, e.g. doing cloning by doing
> > "git clone --no-no-checkout" is equivalent to just "git clone", but
> > this was never intended, and is inconsistent with other --no-OPT
> > options which do pass PARSE_OPT_NONEG.
> 
> First: Why does that bother you, i.e. what's the harm?
> 
> Setting PARSE_OPT_NONEG takes away the ability to toggle the affected
> option.  E.g. git clone would reject --checkout.  Currently users can
> specify --no- options as defaults in aliases and override them on the
> command line if needed, with the patch that won't be possible anymore.
> 
> PARSE_OPT_NONEG should only be used for options where a negation doesn't
> make sense, e.g. for the --stage option of checkout-index.

I think we do strive to avoid "--no-no-foo", and instead have "--no-foo"
and "--foo" to cover both sides.  So for example:

> > -   OPT_BOOL(0, "no-add", >no_add,
> > +   OPT_BOOL_NONEG(0, "no-add", >no_add,
> > N_("ignore additions made by the patch")),

This could be more like:

  OPT_NEGBOOL(0, "add", >no_add, ...)

where NEGBOOL would be smart enough to show "--no-add" in the help as
the primary. It might even be possible to detect the existing line and
have parse-options automatically respect "--foo" when "--no-foo" is
present.  But that may run afoul of callers that add both "--foo" and
"--no-foo" manually.

-Peff

Re: [PATCH] Documentation/git-checkout: make doc. of checkout clearer

2017-04-18 Thread Junio C Hamano

Christoph Michelbach  writes:

> On Mon, 2017-04-17 at 17:31 -0700, Junio C Hamano wrote:
>
>> Obviously, "grab all paths that match  out of , add
>> them to the index and copy them out to the working tree" will never
>> be able to _restore_ the lack of 'd', even it may match the
>>  being used to do this checkout, by removing it from the
>> current index and the working tree.
>
> Exactly. "grab all paths that match  out of , add them
> to the index and copy them out to the working tree" is a more accurate
> description of what the command does (but it might need some rewording
> ;-) ).

Of course it is accurate, as that is how I would write it, not
"restore", if I were doing the documentation.

Care to send in a patch to update the documentation?

Re: [PATCH] clone: add a --no-tags option to clone without tags

2017-04-18 Thread Junio C Hamano

Ævar Arnfjörð Bjarmason   writes:

> Add a --no-tags option to "git clone" to clone without tags. Currently
> there's no easy way to clone a repository and end up with just a
> "master" branch via --single-branch, or track all branches and no
> tags. Now --no-tags can be added to "git clone" with or without
> --single-branch to clone a repository without tags.

Makes sense.

> +--no-tags::
> + Don't clone any tags, and set `remote.origin.tagOpt=--no-tags`
> + in the config, ensuring that future `git pull` and `git fetch`
> + operations won't fetch any tags.

OK.  Not just we ignore tags during the initial cloning, we set
things up so that we do not _follow_ tags in subsequent fetches.

s/won't fetch/won't follow/ is probably needed, as we still allow
users to fetch tags by explicitly naming them on the command line.
The only thing we are doing is to refrain from auto-following.

As an end-user facing help, exact configuration name and value is
much less helpful than telling them the effect of the setting in the
words they understand, i.e. "make later fetches not to follow tags"
or something.  Hardcoded 'origin' in `remote.origin.tagOpt` is not
correct anyway, so I'd suggest redoing this part of the doc.

> @@ -120,6 +121,8 @@ static struct option builtin_clone_options[] = {
>   N_("deepen history of shallow clone, excluding rev")),
>   OPT_BOOL(0, "single-branch", _single_branch,
>   N_("clone only one branch, HEAD or --branch")),
> + OPT_BOOL_NONEG(0, "no-tags", _no_tags,
> +N_("don't clone any tags, and set 
> remote..tagOpt=--no-tags")),

Likewise.  As an end-user facing help, exact configuration name and
value is much less helpful than telling them the effect of the
setting in the words they understand, i.e. "make later fetches not
to follow tags" or something.

> + if (option_no_tags) {
> + strbuf_addf(, "remote.%s.tagOpt", option_origin);

Good to use option_origin.  

> + git_config_set(key.buf, "--no-tags");
> + strbuf_reset();
> + }
> +

Thanks.

Re: [PATCH v3 2/2] xgethostname: handle long hostnames

2017-04-18 Thread Jonathan Nieder

Hi,

David Turner wrote:

> If the full hostname doesn't fit in the buffer supplied to
> gethostname, POSIX does not specify whether the buffer will be
> null-terminated, so to be safe, we should do it ourselves.  Introduce
> new function, xgethostname, which ensures that there is always a \0
> at the end of the buffer.

I think we should detect the error instead of truncating the hostname.
That (on top of your patch) would look like the following.

Thoughts?
Jonathan

diff --git i/wrapper.c w/wrapper.c
index d837417709..e218bd3bef 100644
--- i/wrapper.c
+++ w/wrapper.c
@@ -660,11 +660,13 @@ int xgethostname(char *buf, size_t len)
 {
/*
 * If the full hostname doesn't fit in buf, POSIX does not
-* specify whether the buffer will be null-terminated, so to
-* be safe, do it ourselves.
+* guarantee that an error will be returned. Check for ourselves
+* to be safe.
 */
int ret = gethostname(buf, len);
-   if (!ret)
-   buf[len - 1] = 0;
+   if (!ret && !memchr(buf, 0, len)) {
+   errno = ENAMETOOLONG;
+   return -1;
+   }
return ret;
 }

Re: [PATCH v1] diffcore-rename: speed up register_rename_src

2017-04-18 Thread Jeff King

On Tue, Apr 18, 2017 at 07:44:21PM +, g...@jeffhostetler.com wrote:

> From: Jeff Hostetler 
> 
> Teach register_rename_src() to see if new file pair
> can simply be appended to the rename_src[] array before
> performing the binary search to find the proper insertion
> point.

I guess your perf results show some minor improvement. But I suspect
this is because your synthetic repo does not resemble the real world
very much. You're saving a few strcmps, but for each of those files
you're potentially going to have actually zlib inflate the object
contents and do similarity analysis.

So "absurd number of files doing 100% exact renames" is the absolute
best case, and it saves a few percent.

I dunno. It is not that much code _here_, but I'm not excited about the
prospect of sprinkling this same "check the last one" optimization all
over the code base. I wonder if there's some way to generalize it.

-Peff

Re: [PATCH v3 1/2] use HOST_NAME_MAX to size buffers for gethostname(2)

2017-04-18 Thread Jonathan Nieder

Hi,

David Turner wrote:

> From: René Scharfe 
>
> POSIX limits the length of host names to HOST_NAME_MAX.  Export the
> fallback definition from daemon.c and use this constant to make all
> buffers used with gethostname(2) big enough for any possible result
> and a terminating NUL.

Since some platforms do not define HOST_NAME_MAX and we provide a
fallback, this is not actually big enough for any possible result.
For example, the Hurd allows arbitrarily long hostnames.

Nevertheless this patch seems like the right thing to do.

> Inspired-by: David Turner 
> Signed-off-by: Rene Scharfe 
> Signed-off-by: David Turner 
> ---
>  builtin/gc.c   | 10 +++---
>  builtin/receive-pack.c |  2 +-
>  daemon.c   |  4 
>  fetch-pack.c   |  2 +-
>  git-compat-util.h  |  4 
>  ident.c|  2 +-
>  6 files changed, 14 insertions(+), 10 deletions(-)

Thanks for picking this up.

[...]
> +++ b/builtin/gc.c
[...]
> @@ -257,8 +257,12 @@ static const char *lock_repo_for_gc(int force, pid_t* 
> ret_pid)
>   fd = hold_lock_file_for_update(, pidfile_path,
>  LOCK_DIE_ON_ERROR);
>   if (!force) {
> - static char locking_host[128];
> + static char locking_host[HOST_NAME_MAX + 1];
> + static char *scan_fmt;
>   int should_exit;
> +
> + if (!scan_fmt)
> + scan_fmt = xstrfmt("%s %%%dc", "%"SCNuMAX, 
> HOST_NAME_MAX);
>   fp = fopen(pidfile_path, "r");
>   memset(locking_host, 0, sizeof(locking_host));
>   should_exit =
> @@ -274,7 +278,7 @@ static const char *lock_repo_for_gc(int force, pid_t* 
> ret_pid)
>* running.
>*/
>   time(NULL) - st.st_mtime <= 12 * 3600 &&
> - fscanf(fp, "%"SCNuMAX" %127c", , locking_host) == 2 
> &&
> + fscanf(fp, scan_fmt, , locking_host) == 2 &&

I hoped this could be simplified since HOST_NAME_MAX is a numeric literal,
using the double-expansion trick:

#define STR_(s) # s
#define STR(s) STR_(s)

fscanf(fp, "%" SCNuMAX " %" STR(HOST_NAME_MAX) "c",
   , locking_host);

Unfortunately, I don't think there's anything stopping a platform from
defining

#define HOST_NAME_MAX 0x100

which would break that.

So this run-time calculation appears to be necessary.

Reviewed-by: Jonathan Nieder 

Thanks.

Re: [PATCH v11 2/5] p0006-read-tree-checkout: perf test to time read-tree

2017-04-18 Thread Jeff King

On Tue, Apr 18, 2017 at 10:40:25PM +0100, Thomas Gummerer wrote:

> > +test_perf_default_repo
> 
> I like that it's possible to use a real world repository now instead
> of forcing the use of a synthetic repository :)
> 
> Is there a reason for this being test_perf_default_repo instead of
> test_perf_large_repo?  It seems like generating a large repo is what
> you are doing with repos/many-files.sh.

I'm actually of the opinion that the default/large repo thing should go
away. I think the original premise was that you could pick a
default/large pair and run the whole suite against them. But in reality,
I have always been confused about which one I should use when writing a
perf test, and what I should use when running the suite.

I think it would be more useful for the perf tests to just respect a
single repo parameter. Then you could run the whole suite against each
repo in turn. And we could get results for git.git, linux.git, some
synthetic cases, the gigantic Windows repo, etc.

-Peff

Re: [PATCH v5 4/8] convert: Separate generic structures and variables from the filter specific ones

2017-04-18 Thread Junio C Hamano

Ben Peart  writes:

> On 4/16/2017 11:31 PM, Junio C Hamano wrote:
>> Lars Schneider  writes:
>>
>>> However, I think it eases code maintainability in the long run if a 
>>> function is "as pure as
>>> possible" (IOW does rely on global state as less as possible).
>>
>> If the original relied on a global hashmap and this update kept the
>> code like so, I wouldn't mind the end result of this series
>> (i.e. rely on it being global).  But that is not the case.  It is
>> making the code worse by stopping passing the hashmap through the
>> callchain.
>
> ...  Since it was already a global, I didn't feel
> like this made it any worse.

The code before your series can easily lose the globals with the
attached patch, _exactly_ because it is prepared to be reusable by a
new caller that supplies its own hashmap by passing it through the
callchain.  The child-process machinery Lars made for his conversion
thing, which you are trying to split out to make it reusable, can be
used by somebody other than apply_multi_file_filter() if you do not
lose the hashmap; what the new caller needs to do is to supply its
own hashmap so that they do not interact with the set of processes
used by Lars's conversion machinery.

If we want to lose the global _after_ applying this patch 4/8, don't
we have to essentially _undo_ 4/8?  How can it not be seen as making
it worse?

 convert.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/convert.c b/convert.c
index 8d652bf27c..ff831e85b8 100644
--- a/convert.c
+++ b/convert.c
@@ -503,9 +503,6 @@ struct cmd2process {
struct child_process process;
 };

-static int cmd_process_map_initialized;
-static struct hashmap cmd_process_map;
-
 static int cmd2process_cmp(const struct cmd2process *e1,
   const struct cmd2process *e2,
   const void *unused)
@@ -682,6 +679,9 @@ static int apply_multi_file_filter(const char *path, const 
char *src, size_t len
struct strbuf filter_status = STRBUF_INIT;
const char *filter_type;

+static int cmd_process_map_initialized;
+static struct hashmap cmd_process_map;
+
if (!cmd_process_map_initialized) {
cmd_process_map_initialized = 1;
hashmap_init(_process_map, (hashmap_cmp_fn) 
cmd2process_cmp, 0);

Re: [PATCH 3/2] ls-files: only recurse on active submodules

2017-04-18 Thread Junio C Hamano

Brandon Williams  writes:

> Add in a check to see if a submodule is active before attempting to
> recurse.  This prevents 'ls-files' from trying to operate on a submodule
> which may not exist in the working directory.
>
> Signed-off-by: Brandon Williams 
> ---

Sounds like this is something we can demonstrate its effect with a
simple test in t/ directory?

Re: [PATCH v5 05/11] string-list: add string_list_remove function

2017-04-18 Thread Stefan Beller

On Tue, Apr 18, 2017 at 4:36 PM, Brandon Williams  wrote:
> On 04/18, Stefan Beller wrote:
>> On Tue, Apr 18, 2017 at 4:17 PM, Brandon Williams  wrote:
>>
>> >
>> > +void string_list_remove(struct string_list *list, const char *string,
>> > +   int free_util)
>> > +{
>> > +   int exact_match;
>> > +   int i = get_entry_index(list, string, _match);
>> > +
>> > +   if (exact_match) {
>> > +   if (list->strdup_strings)
>> > +   free(list->items[i].string);
>> > +   if (free_util)
>> > +   free(list->items[i].util);
>> > +
>> > +   list->nr--;
>> > +   memmove(list->items + i, list->items + i + 1,
>> > +   (list->nr - i) * sizeof(struct string_list_item));
>> > +   }
>>
>> Looks correct. I shortly wondered if we'd have any value in returing
>> `exact_match`, as that may save the caller some code, as I imagine the
>> caller to be:
>>
>>   if (!string_list_has_string(, string))
>> die("BUG: ...");
>>   string_list_remove(, string, 0);
>>
>> which could be simplified if we had the exact_match returned, i.e.
>> the string_list_remove returns the implicit string_list_has_string.
>
> I don't really see the value in this, as the only caller doesn't need it
> right now.

yeah, I guess we can add such functionality later.

>
>>
>> >  /*
>> > + * Removes the given string from the sorted list.
>>
>> What happens when the string is not found?
>
> nothing.

yeah that is what I figured from reading the code. The question could
have been worded: Do we want to document what happens if the string
is not found? (A reader in the future may wonder if it is even allowed to
call this function without having consulted string_list_has_string).

>
>>
>> > + */
>> > +void string_list_remove(struct string_list *list, const char *string, int 
>> > free_util);
>>
>> How much do we care about (eventual) consistency? ;)
>> i.e. mark it extern ?
>
> If I need to do another reroll I can mark it extern.

Thanks! This doesn't require a reroll on its own and all other patches
look sane to me.

Thanks,
Stefan

Re: [PATCH v5 05/11] string-list: add string_list_remove function

2017-04-18 Thread Brandon Williams

On 04/18, Stefan Beller wrote:
> On Tue, Apr 18, 2017 at 4:17 PM, Brandon Williams  wrote:
> 
> >
> > +void string_list_remove(struct string_list *list, const char *string,
> > +   int free_util)
> > +{
> > +   int exact_match;
> > +   int i = get_entry_index(list, string, _match);
> > +
> > +   if (exact_match) {
> > +   if (list->strdup_strings)
> > +   free(list->items[i].string);
> > +   if (free_util)
> > +   free(list->items[i].util);
> > +
> > +   list->nr--;
> > +   memmove(list->items + i, list->items + i + 1,
> > +   (list->nr - i) * sizeof(struct string_list_item));
> > +   }
> 
> Looks correct. I shortly wondered if we'd have any value in returing
> `exact_match`, as that may save the caller some code, as I imagine the
> caller to be:
> 
>   if (!string_list_has_string(, string))
> die("BUG: ...");
>   string_list_remove(, string, 0);
> 
> which could be simplified if we had the exact_match returned, i.e.
> the string_list_remove returns the implicit string_list_has_string.

I don't really see the value in this, as the only caller doesn't need it
right now.

> 
> >  /*
> > + * Removes the given string from the sorted list.
> 
> What happens when the string is not found?

nothing.

> 
> > + */
> > +void string_list_remove(struct string_list *list, const char *string, int 
> > free_util);
> 
> How much do we care about (eventual) consistency? ;)
> i.e. mark it extern ?

If I need to do another reroll I can mark it extern.

-- 
Brandon Williams

Re: [PATCH v5 05/11] string-list: add string_list_remove function

2017-04-18 Thread Stefan Beller

On Tue, Apr 18, 2017 at 4:17 PM, Brandon Williams  wrote:

>
> +void string_list_remove(struct string_list *list, const char *string,
> +   int free_util)
> +{
> +   int exact_match;
> +   int i = get_entry_index(list, string, _match);
> +
> +   if (exact_match) {
> +   if (list->strdup_strings)
> +   free(list->items[i].string);
> +   if (free_util)
> +   free(list->items[i].util);
> +
> +   list->nr--;
> +   memmove(list->items + i, list->items + i + 1,
> +   (list->nr - i) * sizeof(struct string_list_item));
> +   }

Looks correct. I shortly wondered if we'd have any value in returing
`exact_match`, as that may save the caller some code, as I imagine the
caller to be:

  if (!string_list_has_string(, string))
die("BUG: ...");
  string_list_remove(, string, 0);

which could be simplified if we had the exact_match returned, i.e.
the string_list_remove returns the implicit string_list_has_string.

>  /*
> + * Removes the given string from the sorted list.

What happens when the string is not found?

> + */
> +void string_list_remove(struct string_list *list, const char *string, int 
> free_util);

How much do we care about (eventual) consistency? ;)
i.e. mark it extern ?

Thanks,
Stefan

Re: [PATCH] clone: add a --no-tags option to clone without tags

2017-04-18 Thread Brandon Williams

On 04/18, Ævar Arnfjörð Bjarmason wrote:
> Add a --no-tags option to "git clone" to clone without tags. Currently
> there's no easy way to clone a repository and end up with just a
> "master" branch via --single-branch, or track all branches and no
> tags. Now --no-tags can be added to "git clone" with or without
> --single-branch to clone a repository without tags.
> 
> Before this the only way of doing this was either by manually tweaking
> the config in a fresh repository:
> 
> git init git &&
> cat >git/.git/config < [remote "origin"]
> url = g...@github.com:git/git.git
> tagOpt = --no-tags
> fetch = +refs/heads/master:refs/remotes/origin/master
> [branch "master"]
> remote = origin
> merge = refs/heads/master
> EOF
> cd git &&
> git pull
> 
> Which requires hardcoding the "master" name, which may not be the same
> branch, or alternatively by setting tagOpt=--no-tags right after
> cloning & deleting any existing tags:
> 
> git clone --single-branch g...@github.com:git/git.git &&
> cd git &&
> git config remote.origin.tagOpt --no-tags &&
> git tag -l | xargs git tag -d
> 
> Which of course was also subtly buggy if --branch was pointed at a
> tag, leaving the user in a detached head:
> 
> git clone --single-branch --branch v2.12.0 g...@github.com:git/git.git &&
> cd git &&
> git config remote.origin.tagOpt --no-tags &&
> git tag -l | xargs git tag -d
> 
> Now all this complexity becomes the much simpler:
> 
> git clone --single-branch --no-tags g...@github.com:git/git.git
> 
> Or in the case of cloning a single tag "branch":
> 
> git clone --single-branch --branch v2.12.0 --no-tags 
> g...@github.com:git/git.git
> 
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---

Patch seems sane to me.

-- 
Brandon Williams

[PATCH v5 11/11] run-command: block signals between fork and execve

2017-04-18 Thread Brandon Williams

From: Eric Wong 

Signal handlers of the parent firing in the forked child may
have unintended side effects.  Rather than auditing every signal
handler we have and will ever have, block signals while forking
and restore default signal handlers in the child before execve.

Restoring default signal handlers is required because
execve does not unblock signals, it only restores default
signal handlers.  So we must restore them with sigprocmask
before execve, leaving a window when signal handlers
we control can fire in the child.  Continue ignoring
ignored signals, but reset the rest to defaults.

Similarly, disable pthread cancellation to future-proof our code
in case we start using cancellation; as cancellation is
implemented with signals in glibc.

Signed-off-by: Eric Wong 
Signed-off-by: Brandon Williams 
---
 run-command.c | 68 +++
 1 file changed, 68 insertions(+)

diff --git a/run-command.c b/run-command.c
index df1edd963..1f3c38e43 100644
--- a/run-command.c
+++ b/run-command.c
@@ -215,6 +215,7 @@ enum child_errcode {
CHILD_ERR_CHDIR,
CHILD_ERR_DUP2,
CHILD_ERR_CLOSE,
+   CHILD_ERR_SIGPROCMASK,
CHILD_ERR_ENOENT,
CHILD_ERR_SILENT,
CHILD_ERR_ERRNO
@@ -303,6 +304,9 @@ static void child_err_spew(struct child_process *cmd, 
struct child_err *cerr)
case CHILD_ERR_CLOSE:
error_errno("close() in child failed");
break;
+   case CHILD_ERR_SIGPROCMASK:
+   error_errno("sigprocmask failed restoring signals");
+   break;
case CHILD_ERR_ENOENT:
error_errno("cannot run %s", cmd->argv[0]);
break;
@@ -400,6 +404,53 @@ static char **prep_childenv(const char *const *deltaenv)
 }
 #endif
 
+struct atfork_state {
+#ifndef NO_PTHREADS
+   int cs;
+#endif
+   sigset_t old;
+};
+
+#ifndef NO_PTHREADS
+static void bug_die(int err, const char *msg)
+{
+   if (err) {
+   errno = err;
+   die_errno("BUG: %s", msg);
+   }
+}
+#endif
+
+static void atfork_prepare(struct atfork_state *as)
+{
+   sigset_t all;
+
+   if (sigfillset())
+   die_errno("sigfillset");
+#ifdef NO_PTHREADS
+   if (sigprocmask(SIG_SETMASK, , >old))
+   die_errno("sigprocmask");
+#else
+   bug_die(pthread_sigmask(SIG_SETMASK, , >old),
+   "blocking all signals");
+   bug_die(pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, >cs),
+   "disabling cancellation");
+#endif
+}
+
+static void atfork_parent(struct atfork_state *as)
+{
+#ifdef NO_PTHREADS
+   if (sigprocmask(SIG_SETMASK, >old, NULL))
+   die_errno("sigprocmask");
+#else
+   bug_die(pthread_setcancelstate(as->cs, NULL),
+   "re-enabling cancellation");
+   bug_die(pthread_sigmask(SIG_SETMASK, >old, NULL),
+   "restoring signal mask");
+#endif
+}
+
 static inline void set_cloexec(int fd)
 {
int flags = fcntl(fd, F_GETFD);
@@ -523,6 +574,7 @@ int start_command(struct child_process *cmd)
char **childenv;
struct argv_array argv = ARGV_ARRAY_INIT;
struct child_err cerr;
+   struct atfork_state as;
 
if (pipe(notify_pipe))
notify_pipe[0] = notify_pipe[1] = -1;
@@ -536,6 +588,7 @@ int start_command(struct child_process *cmd)
 
prepare_cmd(, cmd);
childenv = prep_childenv(cmd->env);
+   atfork_prepare();
 
/*
 * NOTE: In order to prevent deadlocking when using threads special
@@ -549,6 +602,7 @@ int start_command(struct child_process *cmd)
cmd->pid = fork();
failed_errno = errno;
if (!cmd->pid) {
+   int sig;
/*
 * Ensure the default die/error/warn routines do not get
 * called, they can take stdio locks and malloc.
@@ -597,6 +651,19 @@ int start_command(struct child_process *cmd)
child_die(CHILD_ERR_CHDIR);
 
/*
+* restore default signal handlers here, in case
+* we catch a signal right before execve below
+*/
+   for (sig = 1; sig < NSIG; sig++) {
+   /* ignored signals get reset to SIG_DFL on execve */
+   if (signal(sig, SIG_DFL) == SIG_IGN)
+   signal(sig, SIG_IGN);
+   }
+
+   if (sigprocmask(SIG_SETMASK, , NULL) != 0)
+   child_die(CHILD_ERR_SIGPROCMASK);
+
+   /*
 * Attempt to exec using the command and arguments starting at
 * argv.argv[1].  argv.argv[0] contains SHELL_PATH which will
 * be used in the event exec failed with ENOEXEC at which point
@@ -616,6 +683,7 @@ int start_command(struct child_process *cmd)

[PATCH v5 06/11] run-command: prepare child environment before forking

2017-04-18 Thread Brandon Williams

In order to avoid allocation between 'fork()' and 'exec()' prepare the
environment to be used in the child process prior to forking.

Switch to using 'execve()' so that the construct child environment can
used in the exec'd process.

Signed-off-by: Brandon Williams 
---
 run-command.c | 66 ++-
 1 file changed, 56 insertions(+), 10 deletions(-)

diff --git a/run-command.c b/run-command.c
index 1c7a3b611..15e2e74a7 100644
--- a/run-command.c
+++ b/run-command.c
@@ -267,6 +267,55 @@ static void prepare_cmd(struct argv_array *out, const 
struct child_process *cmd)
}
}
 }
+
+static char **prep_childenv(const char *const *deltaenv)
+{
+   extern char **environ;
+   char **childenv;
+   struct string_list env = STRING_LIST_INIT_DUP;
+   struct strbuf key = STRBUF_INIT;
+   const char *const *p;
+   int i;
+
+   /* Construct a sorted string list consisting of the current environ */
+   for (p = (const char *const *) environ; p && *p; p++) {
+   const char *equals = strchr(*p, '=');
+
+   if (equals) {
+   strbuf_reset();
+   strbuf_add(, *p, equals - *p);
+   string_list_append(, key.buf)->util = (void *) *p;
+   } else {
+   string_list_append(, *p)->util = (void *) *p;
+   }
+   }
+   string_list_sort();
+
+   /* Merge in 'deltaenv' with the current environ */
+   for (p = deltaenv; p && *p; p++) {
+   const char *equals = strchr(*p, '=');
+
+   if (equals) {
+   /* ('key=value'), insert or replace entry */
+   strbuf_reset();
+   strbuf_add(, *p, equals - *p);
+   string_list_insert(, key.buf)->util = (void *) *p;
+   } else {
+   /* otherwise ('key') remove existing entry */
+   string_list_remove(, *p, 0);
+   }
+   }
+
+   /* Create an array of 'char *' to be used as the childenv */
+   childenv = xmalloc((env.nr + 1) * sizeof(char *));
+   for (i = 0; i < env.nr; i++)
+   childenv[i] = env.items[i].util;
+   childenv[env.nr] = NULL;
+
+   string_list_clear(, 0);
+   strbuf_release();
+   return childenv;
+}
 #endif
 
 static inline void set_cloexec(int fd)
@@ -395,12 +444,14 @@ int start_command(struct child_process *cmd)
 #ifndef GIT_WINDOWS_NATIVE
 {
int notify_pipe[2];
+   char **childenv;
struct argv_array argv = ARGV_ARRAY_INIT;
 
if (pipe(notify_pipe))
notify_pipe[0] = notify_pipe[1] = -1;
 
prepare_cmd(, cmd);
+   childenv = prep_childenv(cmd->env);
 
cmd->pid = fork();
failed_errno = errno;
@@ -456,14 +507,6 @@ int start_command(struct child_process *cmd)
if (cmd->dir && chdir(cmd->dir))
die_errno("exec '%s': cd to '%s' failed", cmd->argv[0],
cmd->dir);
-   if (cmd->env) {
-   for (; *cmd->env; cmd->env++) {
-   if (strchr(*cmd->env, '='))
-   putenv((char *)*cmd->env);
-   else
-   unsetenv(*cmd->env);
-   }
-   }
 
/*
 * Attempt to exec using the command and arguments starting at
@@ -471,9 +514,11 @@ int start_command(struct child_process *cmd)
 * be used in the event exec failed with ENOEXEC at which point
 * we will try to interpret the command using 'sh'.
 */
-   execv(argv.argv[1], (char *const *) argv.argv + 1);
+   execve(argv.argv[1], (char *const *) argv.argv + 1,
+  (char *const *) childenv);
if (errno == ENOEXEC)
-   execv(argv.argv[0], (char *const *) argv.argv);
+   execve(argv.argv[0], (char *const *) argv.argv,
+  (char *const *) childenv);
 
if (errno == ENOENT) {
if (!cmd->silent_exec_failure)
@@ -509,6 +554,7 @@ int start_command(struct child_process *cmd)
close(notify_pipe[0]);
 
argv_array_clear();
+   free(childenv);
 }
 #else
 {
-- 
2.12.2.816.g281164-goog

[PATCH v5 09/11] run-command: handle dup2 and close errors in child

2017-04-18 Thread Brandon Williams

Signed-off-by: Brandon Williams 
---
 run-command.c | 58 ++
 1 file changed, 42 insertions(+), 16 deletions(-)

diff --git a/run-command.c b/run-command.c
index 1f15714b1..615b6e9c9 100644
--- a/run-command.c
+++ b/run-command.c
@@ -213,6 +213,8 @@ static int child_notifier = -1;
 
 enum child_errcode {
CHILD_ERR_CHDIR,
+   CHILD_ERR_DUP2,
+   CHILD_ERR_CLOSE,
CHILD_ERR_ENOENT,
CHILD_ERR_SILENT,
CHILD_ERR_ERRNO
@@ -235,6 +237,24 @@ static void child_die(enum child_errcode err)
_exit(1);
 }
 
+static void child_dup2(int fd, int to)
+{
+   if (dup2(fd, to) < 0)
+   child_die(CHILD_ERR_DUP2);
+}
+
+static void child_close(int fd)
+{
+   if (close(fd))
+   child_die(CHILD_ERR_CLOSE);
+}
+
+static void child_close_pair(int fd[2])
+{
+   child_close(fd[0]);
+   child_close(fd[1]);
+}
+
 /*
  * parent will make it look like the child spewed a fatal error and died
  * this is needed to prevent changes to t0061.
@@ -277,6 +297,12 @@ static void child_err_spew(struct child_process *cmd, 
struct child_err *cerr)
error_errno("exec '%s': cd to '%s' failed",
cmd->argv[0], cmd->dir);
break;
+   case CHILD_ERR_DUP2:
+   error_errno("dup2() in child failed");
+   break;
+   case CHILD_ERR_CLOSE:
+   error_errno("close() in child failed");
+   break;
case CHILD_ERR_ENOENT:
error_errno("cannot run %s", cmd->argv[0]);
break;
@@ -527,35 +553,35 @@ int start_command(struct child_process *cmd)
child_notifier = notify_pipe[1];
 
if (cmd->no_stdin)
-   dup2(null_fd, 0);
+   child_dup2(null_fd, 0);
else if (need_in) {
-   dup2(fdin[0], 0);
-   close_pair(fdin);
+   child_dup2(fdin[0], 0);
+   child_close_pair(fdin);
} else if (cmd->in) {
-   dup2(cmd->in, 0);
-   close(cmd->in);
+   child_dup2(cmd->in, 0);
+   child_close(cmd->in);
}
 
if (cmd->no_stderr)
-   dup2(null_fd, 2);
+   child_dup2(null_fd, 2);
else if (need_err) {
-   dup2(fderr[1], 2);
-   close_pair(fderr);
+   child_dup2(fderr[1], 2);
+   child_close_pair(fderr);
} else if (cmd->err > 1) {
-   dup2(cmd->err, 2);
-   close(cmd->err);
+   child_dup2(cmd->err, 2);
+   child_close(cmd->err);
}
 
if (cmd->no_stdout)
-   dup2(null_fd, 1);
+   child_dup2(null_fd, 1);
else if (cmd->stdout_to_stderr)
-   dup2(2, 1);
+   child_dup2(2, 1);
else if (need_out) {
-   dup2(fdout[1], 1);
-   close_pair(fdout);
+   child_dup2(fdout[1], 1);
+   child_close_pair(fdout);
} else if (cmd->out > 1) {
-   dup2(cmd->out, 1);
-   close(cmd->out);
+   child_dup2(cmd->out, 1);
+   child_close(cmd->out);
}
 
if (cmd->dir && chdir(cmd->dir))
-- 
2.12.2.816.g281164-goog

[PATCH v5 04/11] run-command: use the async-signal-safe execv instead of execvp

2017-04-18 Thread Brandon Williams

Convert the function used to exec from 'execvp()' to 'execv()' as the (p)
variant of exec isn't async-signal-safe and has the potential to call malloc
during the path resolution it performs.  Instead we simply do the path
resolution ourselves during the preparation stage prior to forking.  There also
don't exist any portable (p) variants which also take in an environment to use
in the exec'd process.  This allows easy migration to using 'execve()' in a
future patch.

Also, as noted in [1], in the event of an ENOEXEC the (p) variants of
exec will attempt to execute the command by interpreting it with the
'sh' utility.  To maintain this functionality, if 'execv()' fails with
ENOEXEC, start_command will atempt to execute the command by
interpreting it with 'sh'.

[1] http://pubs.opengroup.org/onlinepubs/009695399/functions/exec.html

Signed-off-by: Brandon Williams 
---
 run-command.c | 30 +-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/run-command.c b/run-command.c
index d8d143795..1c7a3b611 100644
--- a/run-command.c
+++ b/run-command.c
@@ -238,6 +238,12 @@ static void prepare_cmd(struct argv_array *out, const 
struct child_process *cmd)
if (!cmd->argv[0])
die("BUG: command is empty");
 
+   /*
+* Add SHELL_PATH so in the event exec fails with ENOEXEC we can
+* attempt to interpret the command with 'sh'.
+*/
+   argv_array_push(out, SHELL_PATH);
+
if (cmd->git_cmd) {
argv_array_push(out, "git");
argv_array_pushv(out, cmd->argv);
@@ -246,6 +252,20 @@ static void prepare_cmd(struct argv_array *out, const 
struct child_process *cmd)
} else {
argv_array_pushv(out, cmd->argv);
}
+
+   /*
+* If there are no '/' characters in the command then perform a path
+* lookup and use the resolved path as the command to exec.  If there
+* are no '/' characters or if the command wasn't found in the path,
+* have exec attempt to invoke the command directly.
+*/
+   if (!strchr(out->argv[1], '/')) {
+   char *program = locate_in_PATH(out->argv[1]);
+   if (program) {
+   free((char *)out->argv[1]);
+   out->argv[1] = program;
+   }
+   }
 }
 #endif
 
@@ -445,7 +465,15 @@ int start_command(struct child_process *cmd)
}
}
 
-   sane_execvp(argv.argv[0], (char *const *) argv.argv);
+   /*
+* Attempt to exec using the command and arguments starting at
+* argv.argv[1].  argv.argv[0] contains SHELL_PATH which will
+* be used in the event exec failed with ENOEXEC at which point
+* we will try to interpret the command using 'sh'.
+*/
+   execv(argv.argv[1], (char *const *) argv.argv + 1);
+   if (errno == ENOEXEC)
+   execv(argv.argv[0], (char *const *) argv.argv);
 
if (errno == ENOENT) {
if (!cmd->silent_exec_failure)
-- 
2.12.2.816.g281164-goog

[PATCH v5 08/11] run-command: eliminate calls to error handling functions in child

2017-04-18 Thread Brandon Williams

All of our standard error handling paths have the potential to
call malloc or take stdio locks; so we must avoid them inside
the forked child.

Instead, the child only writes an 8 byte struct atomically to
the parent through the notification pipe to propagate an error.
All user-visible error reporting happens from the parent;
even avoiding functions like atexit(3) and exit(3).

Helped-by: Eric Wong 
Signed-off-by: Brandon Williams 
---
 run-command.c | 121 ++
 1 file changed, 89 insertions(+), 32 deletions(-)

diff --git a/run-command.c b/run-command.c
index b3a35dd82..1f15714b1 100644
--- a/run-command.c
+++ b/run-command.c
@@ -211,14 +211,82 @@ static const char **prepare_shell_cmd(struct argv_array 
*out, const char **argv)
 #ifndef GIT_WINDOWS_NATIVE
 static int child_notifier = -1;
 
-static void notify_parent(void)
+enum child_errcode {
+   CHILD_ERR_CHDIR,
+   CHILD_ERR_ENOENT,
+   CHILD_ERR_SILENT,
+   CHILD_ERR_ERRNO
+};
+
+struct child_err {
+   enum child_errcode err;
+   int syserr; /* errno */
+};
+
+static void child_die(enum child_errcode err)
 {
-   /*
-* execvp failed.  If possible, we'd like to let start_command
-* know, so failures like ENOENT can be handled right away; but
-* otherwise, finish_command will still report the error.
-*/
-   xwrite(child_notifier, "", 1);
+   struct child_err buf;
+
+   buf.err = err;
+   buf.syserr = errno;
+
+   /* write(2) on buf smaller than PIPE_BUF (min 512) is atomic: */
+   xwrite(child_notifier, , sizeof(buf));
+   _exit(1);
+}
+
+/*
+ * parent will make it look like the child spewed a fatal error and died
+ * this is needed to prevent changes to t0061.
+ */
+static void fake_fatal(const char *err, va_list params)
+{
+   vreportf("fatal: ", err, params);
+}
+
+static void child_error_fn(const char *err, va_list params)
+{
+   const char msg[] = "error() should not be called in child\n";
+   xwrite(2, msg, sizeof(msg) - 1);
+}
+
+static void child_warn_fn(const char *err, va_list params)
+{
+   const char msg[] = "warn() should not be called in child\n";
+   xwrite(2, msg, sizeof(msg) - 1);
+}
+
+static void NORETURN child_die_fn(const char *err, va_list params)
+{
+   const char msg[] = "die() should not be called in child\n";
+   xwrite(2, msg, sizeof(msg) - 1);
+   _exit(2);
+}
+
+/* this runs in the parent process */
+static void child_err_spew(struct child_process *cmd, struct child_err *cerr)
+{
+   static void (*old_errfn)(const char *err, va_list params);
+
+   old_errfn = get_error_routine();
+   set_error_routine(fake_fatal);
+   errno = cerr->syserr;
+
+   switch (cerr->err) {
+   case CHILD_ERR_CHDIR:
+   error_errno("exec '%s': cd to '%s' failed",
+   cmd->argv[0], cmd->dir);
+   break;
+   case CHILD_ERR_ENOENT:
+   error_errno("cannot run %s", cmd->argv[0]);
+   break;
+   case CHILD_ERR_SILENT:
+   break;
+   case CHILD_ERR_ERRNO:
+   error_errno("cannot exec '%s'", cmd->argv[0]);
+   break;
+   }
+   set_error_routine(old_errfn);
 }
 
 static void prepare_cmd(struct argv_array *out, const struct child_process 
*cmd)
@@ -341,13 +409,6 @@ static int wait_or_whine(pid_t pid, const char *argv0, int 
in_signal)
code += 128;
} else if (WIFEXITED(status)) {
code = WEXITSTATUS(status);
-   /*
-* Convert special exit code when execvp failed.
-*/
-   if (code == 127) {
-   code = -1;
-   failed_errno = ENOENT;
-   }
} else {
error("waitpid is confused (%s)", argv0);
}
@@ -435,6 +496,7 @@ int start_command(struct child_process *cmd)
int null_fd = -1;
char **childenv;
struct argv_array argv = ARGV_ARRAY_INIT;
+   struct child_err cerr;
 
if (pipe(notify_pipe))
notify_pipe[0] = notify_pipe[1] = -1;
@@ -453,20 +515,16 @@ int start_command(struct child_process *cmd)
failed_errno = errno;
if (!cmd->pid) {
/*
-* Redirect the channel to write syscall error messages to
-* before redirecting the process's stderr so that all die()
-* in subsequent call paths use the parent's stderr.
+* Ensure the default die/error/warn routines do not get
+* called, they can take stdio locks and malloc.
 */
-   if (cmd->no_stderr || need_err) {
-   int child_err = dup(2);
-   set_cloexec(child_err);
-   set_error_handle(fdopen(child_err, "w"));
-   }
+

[PATCH v5 10/11] run-command: add note about forking and threading

2017-04-18 Thread Brandon Williams

All non-Async-Signal-Safe functions (e.g. malloc and die) were removed
between 'fork' and 'exec' in start_command in order to avoid potential
deadlocking when forking while multiple threads are running.  This
deadlocking is possible when a thread (other than the one forking) has
acquired a lock and didn't get around to releasing it before the fork.
This leaves the lock in a locked state in the resulting process with no
hope of it ever being released.

Add a note describing this potential pitfall before the call to 'fork()'
so people working in this section of the code know to only use
Async-Signal-Safe functions in the child process.

Signed-off-by: Brandon Williams 
---
 run-command.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/run-command.c b/run-command.c
index 615b6e9c9..df1edd963 100644
--- a/run-command.c
+++ b/run-command.c
@@ -537,6 +537,15 @@ int start_command(struct child_process *cmd)
prepare_cmd(, cmd);
childenv = prep_childenv(cmd->env);
 
+   /*
+* NOTE: In order to prevent deadlocking when using threads special
+* care should be taken with the function calls made in between the
+* fork() and exec() calls.  No calls should be made to functions which
+* require acquiring a lock (e.g. malloc) as the lock could have been
+* held by another thread at the time of forking, causing the lock to
+* never be released in the child process.  This means only
+* Async-Signal-Safe functions are permitted in the child.
+*/
cmd->pid = fork();
failed_errno = errno;
if (!cmd->pid) {
-- 
2.12.2.816.g281164-goog

[PATCH v5 07/11] run-command: don't die in child when duping /dev/null

2017-04-18 Thread Brandon Williams

Signed-off-by: Brandon Williams 
---
 run-command.c | 28 +---
 1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/run-command.c b/run-command.c
index 15e2e74a7..b3a35dd82 100644
--- a/run-command.c
+++ b/run-command.c
@@ -117,18 +117,6 @@ static inline void close_pair(int fd[2])
close(fd[1]);
 }
 
-#ifndef GIT_WINDOWS_NATIVE
-static inline void dup_devnull(int to)
-{
-   int fd = open("/dev/null", O_RDWR);
-   if (fd < 0)
-   die_errno(_("open /dev/null failed"));
-   if (dup2(fd, to) < 0)
-   die_errno(_("dup2(%d,%d) failed"), fd, to);
-   close(fd);
-}
-#endif
-
 static char *locate_in_PATH(const char *file)
 {
const char *p = getenv("PATH");
@@ -444,12 +432,20 @@ int start_command(struct child_process *cmd)
 #ifndef GIT_WINDOWS_NATIVE
 {
int notify_pipe[2];
+   int null_fd = -1;
char **childenv;
struct argv_array argv = ARGV_ARRAY_INIT;
 
if (pipe(notify_pipe))
notify_pipe[0] = notify_pipe[1] = -1;
 
+   if (cmd->no_stdin || cmd->no_stdout || cmd->no_stderr) {
+   null_fd = open("/dev/null", O_RDWR | O_CLOEXEC);
+   if (null_fd < 0)
+   die_errno(_("open /dev/null failed"));
+   set_cloexec(null_fd);
+   }
+
prepare_cmd(, cmd);
childenv = prep_childenv(cmd->env);
 
@@ -473,7 +469,7 @@ int start_command(struct child_process *cmd)
atexit(notify_parent);
 
if (cmd->no_stdin)
-   dup_devnull(0);
+   dup2(null_fd, 0);
else if (need_in) {
dup2(fdin[0], 0);
close_pair(fdin);
@@ -483,7 +479,7 @@ int start_command(struct child_process *cmd)
}
 
if (cmd->no_stderr)
-   dup_devnull(2);
+   dup2(null_fd, 2);
else if (need_err) {
dup2(fderr[1], 2);
close_pair(fderr);
@@ -493,7 +489,7 @@ int start_command(struct child_process *cmd)
}
 
if (cmd->no_stdout)
-   dup_devnull(1);
+   dup2(null_fd, 1);
else if (cmd->stdout_to_stderr)
dup2(2, 1);
else if (need_out) {
@@ -553,6 +549,8 @@ int start_command(struct child_process *cmd)
}
close(notify_pipe[0]);
 
+   if (null_fd >= 0)
+   close(null_fd);
argv_array_clear();
free(childenv);
 }
-- 
2.12.2.816.g281164-goog

[PATCH v5 00/11] forking and threading

2017-04-18 Thread Brandon Williams

v5 addresses the issues with creating the child's environment.  Instead of
doing vanilla C string manipulation, I used a struct string_list and struct
strbuf.  The code should be much easier to read and understand now.  I also
needed to add a function to remove a string_list_item from a string_list to the
string_list API.

Brandon Williams (10):
  t5550: use write_script to generate post-update hook
  t0061: run_command executes scripts without a #! line
  run-command: prepare command before forking
  run-command: use the async-signal-safe execv instead of execvp
  string-list: add string_list_remove function
  run-command: prepare child environment before forking
  run-command: don't die in child when duping /dev/null
  run-command: eliminate calls to error handling functions in child
  run-command: handle dup2 and close errors in child
  run-command: add note about forking and threading

Eric Wong (1):
  run-command: block signals between fork and execve

 run-command.c  | 404 +++--
 string-list.c  |  18 ++
 string-list.h  |   5 +
 t/t0061-run-command.sh |  11 ++
 t/t5550-http-fetch-dumb.sh |   5 +-
 5 files changed, 358 insertions(+), 85 deletions(-)

-- 
2.12.2.816.g281164-goog

[PATCH v5 02/11] t0061: run_command executes scripts without a #! line

2017-04-18 Thread Brandon Williams

Add a test to 't0061-run-command.sh' to ensure that run_command can
continue to execute scripts which don't include a '#!' line.

Signed-off-by: Brandon Williams 
---
 t/t0061-run-command.sh | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/t/t0061-run-command.sh b/t/t0061-run-command.sh
index 12228b4aa..1a7490e29 100755
--- a/t/t0061-run-command.sh
+++ b/t/t0061-run-command.sh
@@ -26,6 +26,17 @@ test_expect_success 'run_command can run a command' '
test_cmp empty err
 '
 
+test_expect_success 'run_command can run a script without a #! line' '
+   cat >hello <<-\EOF &&
+   cat hello-script
+   EOF
+   chmod +x hello &&
+   test-run-command run-command ./hello >actual 2>err &&
+
+   test_cmp hello-script actual &&
+   test_cmp empty err
+'
+
 test_expect_success POSIXPERM 'run_command reports EACCES' '
cat hello-script >hello.sh &&
chmod -x hello.sh &&
-- 
2.12.2.816.g281164-goog

[PATCH v5 01/11] t5550: use write_script to generate post-update hook

2017-04-18 Thread Brandon Williams

The post-update hooks created in t5550-http-fetch-dumb.sh is missing the
"!#/bin/sh" line which can cause issues with portability.  Instead
create the hook using the 'write_script' function which includes the
proper "#!" line.

Signed-off-by: Brandon Williams 
---
 t/t5550-http-fetch-dumb.sh | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/t/t5550-http-fetch-dumb.sh b/t/t5550-http-fetch-dumb.sh
index 87308cdce..8552184e7 100755
--- a/t/t5550-http-fetch-dumb.sh
+++ b/t/t5550-http-fetch-dumb.sh
@@ -20,8 +20,9 @@ test_expect_success 'create http-accessible bare repository 
with loose objects'
(cd "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" &&
 git config core.bare true &&
 mkdir -p hooks &&
-echo "exec git update-server-info" >hooks/post-update &&
-chmod +x hooks/post-update &&
+write_script "hooks/post-update" <<-\EOF &&
+exec git update-server-info
+   EOF
 hooks/post-update
) &&
git remote add public "$HTTPD_DOCUMENT_ROOT_PATH/repo.git" &&
-- 
2.12.2.816.g281164-goog

[PATCH v5 05/11] string-list: add string_list_remove function

2017-04-18 Thread Brandon Williams

Teach string-list to be able to remove a string from a sorted
'struct string_list'.

Signed-off-by: Brandon Williams 
---
 string-list.c | 18 ++
 string-list.h |  5 +
 2 files changed, 23 insertions(+)

diff --git a/string-list.c b/string-list.c
index 45016ad86..8f7b69ada 100644
--- a/string-list.c
+++ b/string-list.c
@@ -67,6 +67,24 @@ struct string_list_item *string_list_insert(struct 
string_list *list, const char
return list->items + index;
 }
 
+void string_list_remove(struct string_list *list, const char *string,
+   int free_util)
+{
+   int exact_match;
+   int i = get_entry_index(list, string, _match);
+
+   if (exact_match) {
+   if (list->strdup_strings)
+   free(list->items[i].string);
+   if (free_util)
+   free(list->items[i].util);
+
+   list->nr--;
+   memmove(list->items + i, list->items + i + 1,
+   (list->nr - i) * sizeof(struct string_list_item));
+   }
+}
+
 int string_list_has_string(const struct string_list *list, const char *string)
 {
int exact_match;
diff --git a/string-list.h b/string-list.h
index d3809a141..18520dbc8 100644
--- a/string-list.h
+++ b/string-list.h
@@ -63,6 +63,11 @@ int string_list_find_insert_index(const struct string_list 
*list, const char *st
 struct string_list_item *string_list_insert(struct string_list *list, const 
char *string);
 
 /*
+ * Removes the given string from the sorted list.
+ */
+void string_list_remove(struct string_list *list, const char *string, int 
free_util);
+
+/*
  * Checks if the given string is part of a sorted list. If it is part of the 
list,
  * return the coresponding string_list_item, NULL otherwise.
  */
-- 
2.12.2.816.g281164-goog

[PATCH v5 03/11] run-command: prepare command before forking

2017-04-18 Thread Brandon Williams

According to [1] we need to only call async-signal-safe operations between fork
and exec.  Using malloc to build the argv array isn't async-signal-safe.

In order to avoid allocation between 'fork()' and 'exec()' prepare the
argv array used in the exec call prior to forking the process.

[1] http://pubs.opengroup.org/onlinepubs/009695399/functions/fork.html

Signed-off-by: Brandon Williams 
---
 run-command.c | 46 ++
 1 file changed, 26 insertions(+), 20 deletions(-)

diff --git a/run-command.c b/run-command.c
index 574b81d3e..d8d143795 100644
--- a/run-command.c
+++ b/run-command.c
@@ -221,18 +221,6 @@ static const char **prepare_shell_cmd(struct argv_array 
*out, const char **argv)
 }
 
 #ifndef GIT_WINDOWS_NATIVE
-static int execv_shell_cmd(const char **argv)
-{
-   struct argv_array nargv = ARGV_ARRAY_INIT;
-   prepare_shell_cmd(, argv);
-   trace_argv_printf(nargv.argv, "trace: exec:");
-   sane_execvp(nargv.argv[0], (char **)nargv.argv);
-   argv_array_clear();
-   return -1;
-}
-#endif
-
-#ifndef GIT_WINDOWS_NATIVE
 static int child_notifier = -1;
 
 static void notify_parent(void)
@@ -244,6 +232,21 @@ static void notify_parent(void)
 */
xwrite(child_notifier, "", 1);
 }
+
+static void prepare_cmd(struct argv_array *out, const struct child_process 
*cmd)
+{
+   if (!cmd->argv[0])
+   die("BUG: command is empty");
+
+   if (cmd->git_cmd) {
+   argv_array_push(out, "git");
+   argv_array_pushv(out, cmd->argv);
+   } else if (cmd->use_shell) {
+   prepare_shell_cmd(out, cmd->argv);
+   } else {
+   argv_array_pushv(out, cmd->argv);
+   }
+}
 #endif
 
 static inline void set_cloexec(int fd)
@@ -372,9 +375,13 @@ int start_command(struct child_process *cmd)
 #ifndef GIT_WINDOWS_NATIVE
 {
int notify_pipe[2];
+   struct argv_array argv = ARGV_ARRAY_INIT;
+
if (pipe(notify_pipe))
notify_pipe[0] = notify_pipe[1] = -1;
 
+   prepare_cmd(, cmd);
+
cmd->pid = fork();
failed_errno = errno;
if (!cmd->pid) {
@@ -437,12 +444,9 @@ int start_command(struct child_process *cmd)
unsetenv(*cmd->env);
}
}
-   if (cmd->git_cmd)
-   execv_git_cmd(cmd->argv);
-   else if (cmd->use_shell)
-   execv_shell_cmd(cmd->argv);
-   else
-   sane_execvp(cmd->argv[0], (char *const*) cmd->argv);
+
+   sane_execvp(argv.argv[0], (char *const *) argv.argv);
+
if (errno == ENOENT) {
if (!cmd->silent_exec_failure)
error("cannot run %s: %s", cmd->argv[0],
@@ -458,7 +462,7 @@ int start_command(struct child_process *cmd)
mark_child_for_cleanup(cmd->pid, cmd);
 
/*
-* Wait for child's execvp. If the execvp succeeds (or if fork()
+* Wait for child's exec. If the exec succeeds (or if fork()
 * failed), EOF is seen immediately by the parent. Otherwise, the
 * child process sends a single byte.
 * Note that use of this infrastructure is completely advisory,
@@ -467,7 +471,7 @@ int start_command(struct child_process *cmd)
close(notify_pipe[1]);
if (read(notify_pipe[0], _pipe[1], 1) == 1) {
/*
-* At this point we know that fork() succeeded, but execvp()
+* At this point we know that fork() succeeded, but exec()
 * failed. Errors have been reported to our stderr.
 */
wait_or_whine(cmd->pid, cmd->argv[0], 0);
@@ -475,6 +479,8 @@ int start_command(struct child_process *cmd)
cmd->pid = -1;
}
close(notify_pipe[0]);
+
+   argv_array_clear();
 }
 #else
 {
-- 
2.12.2.816.g281164-goog

Re: [PATCH] various: disallow --no-no-OPT for --no-opt options

2017-04-18 Thread René Scharfe


Am 18.04.2017 um 19:09 schrieb Ævar Arnfjörð Bjarmason:

Change various --no-OPT options which don't supply PARSE_OPT_NONEG to
make --no-no-OPT an error.

All of these worked before this change, e.g. doing cloning by doing
"git clone --no-no-checkout" is equivalent to just "git clone", but
this was never intended, and is inconsistent with other --no-OPT
options which do pass PARSE_OPT_NONEG.


First: Why does that bother you, i.e. what's the harm?

Setting PARSE_OPT_NONEG takes away the ability to toggle the affected
option.  E.g. git clone would reject --checkout.  Currently users can
specify --no- options as defaults in aliases and override them on the
command line if needed, with the patch that won't be possible anymore.

PARSE_OPT_NONEG should only be used for options where a negation doesn't
make sense, e.g. for the --stage option of checkout-index.

Also: https://public-inbox.org/git/4f4d3545.6060...@lsrfire.ath.cx/


Signed-off-by: Ævar Arnfjörð Bjarmason 
---

For this one I'm not bothering to track down & CC every single person
who originally added these --no-OPT options. Clearly just a trivial
bug we can fix.

  apply.c   | 2 +-
  builtin/bisect--helper.c  | 2 +-
  builtin/check-ignore.c| 2 +-
  builtin/checkout-index.c  | 2 +-
  builtin/clone.c   | 4 ++--
  builtin/commit.c  | 4 ++--
  builtin/fast-export.c | 2 +-
  builtin/grep.c| 2 +-
  builtin/hash-object.c | 2 +-
  builtin/log.c | 4 ++--
  builtin/push.c| 2 +-
  builtin/read-tree.c   | 2 +-
  builtin/revert.c  | 2 +-
  builtin/show-branch.c | 2 +-
  builtin/update-ref.c  | 2 +-
  parse-options.h   | 7 +++
  t/helper/test-parse-options.c | 1 +
  t/t0040-parse-options.sh  | 3 +++
  18 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/apply.c b/apply.c
index e6dbab26ad..47a91d0762 100644
--- a/apply.c
+++ b/apply.c
@@ -4917,7 +4917,7 @@ int apply_parse_options(int argc, const char **argv,
{ OPTION_CALLBACK, 'p', NULL, state, N_("num"),
N_("remove  leading slashes from traditional diff 
paths"),
0, apply_option_parse_p },
-   OPT_BOOL(0, "no-add", >no_add,
+   OPT_BOOL_NONEG(0, "no-add", >no_add,
N_("ignore additions made by the patch")),
OPT_BOOL(0, "stat", >diffstat,
N_("instead of applying the patch, output diffstat for the 
input")),
diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c
index 3324229025..5fe9093947 100644
--- a/builtin/bisect--helper.c
+++ b/builtin/bisect--helper.c
@@ -15,7 +15,7 @@ int cmd_bisect__helper(int argc, const char **argv, const 
char *prefix)
struct option options[] = {
OPT_BOOL(0, "next-all", _all,
 N_("perform 'git bisect next'")),
-   OPT_BOOL(0, "no-checkout", _checkout,
+   OPT_BOOL_NONEG(0, "no-checkout", _checkout,
 N_("update BISECT_HEAD instead of checking out the current 
commit")),
OPT_END()
};
diff --git a/builtin/check-ignore.c b/builtin/check-ignore.c
index 1d73d3ca3d..cb6467946e 100644
--- a/builtin/check-ignore.c
+++ b/builtin/check-ignore.c
@@ -24,7 +24,7 @@ static const struct option check_ignore_options[] = {
 N_("terminate input and output records by a NUL character")),
OPT_BOOL('n', "non-matching", _non_matching,
 N_("show non-matching input paths")),
-   OPT_BOOL(0, "no-index", _index,
+   OPT_BOOL_NONEG(0, "no-index", _index,
 N_("ignore index when checking")),
OPT_END()
  };
diff --git a/builtin/checkout-index.c b/builtin/checkout-index.c
index 07631d0c9c..cf84e31ce8 100644
--- a/builtin/checkout-index.c
+++ b/builtin/checkout-index.c
@@ -161,7 +161,7 @@ int cmd_checkout_index(int argc, const char **argv, const 
char *prefix)
OPT__FORCE(, N_("force overwrite of existing files")),
OPT__QUIET(,
N_("no warning for existing files and files not in 
index")),
-   OPT_BOOL('n', "no-create", _new,
+   OPT_BOOL_NONEG('n', "no-create", _new,
N_("don't checkout new files")),
OPT_BOOL('u', "index", _opt,
 N_("update stat information in the index file")),
diff --git a/builtin/clone.c b/builtin/clone.c
index de85b85254..32c5843563 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -76,7 +76,7 @@ static struct option builtin_clone_options[] = {
OPT__VERBOSITY(_verbosity),
OPT_BOOL(0, "progress", _progress,
 N_("force progress reporting")),
-   OPT_BOOL('n', "no-checkout", _no_checkout,
+   OPT_BOOL_NONEG('n', "no-checkout", _no_checkout,

[PATCH v3 2/2] xgethostname: handle long hostnames

2017-04-18 Thread David Turner

If the full hostname doesn't fit in the buffer supplied to
gethostname, POSIX does not specify whether the buffer will be
null-terminated, so to be safe, we should do it ourselves.  Introduce
new function, xgethostname, which ensures that there is always a \0
at the end of the buffer.

Signed-off-by: David Turner 
---
 builtin/gc.c   |  2 +-
 builtin/receive-pack.c |  2 +-
 fetch-pack.c   |  2 +-
 git-compat-util.h  |  2 ++
 ident.c|  2 +-
 wrapper.c  | 13 +
 6 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index 4c4a36e2b5..33a1edabbc 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -250,7 +250,7 @@ static const char *lock_repo_for_gc(int force, pid_t* 
ret_pid)
/* already locked */
return NULL;
 
-   if (gethostname(my_host, sizeof(my_host)))
+   if (xgethostname(my_host, sizeof(my_host)))
xsnprintf(my_host, sizeof(my_host), "unknown");
 
pidfile_path = git_pathdup("gc.pid");
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 2612efad3d..0ca423a711 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -1700,7 +1700,7 @@ static const char *unpack(int err_fd, struct shallow_info 
*si)
argv_array_pushl(, "index-pack",
 "--stdin", hdr_arg, NULL);
 
-   if (gethostname(hostname, sizeof(hostname)))
+   if (xgethostname(hostname, sizeof(hostname)))
xsnprintf(hostname, sizeof(hostname), "localhost");
argv_array_pushf(,
 "--keep=receive-pack %"PRIuMAX" on %s",
diff --git a/fetch-pack.c b/fetch-pack.c
index 055f568775..15d59a0440 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -803,7 +803,7 @@ static int get_pack(struct fetch_pack_args *args,
argv_array_push(, "--fix-thin");
if (args->lock_pack || unpack_limit) {
char hostname[HOST_NAME_MAX + 1];
-   if (gethostname(hostname, sizeof(hostname)))
+   if (xgethostname(hostname, sizeof(hostname)))
xsnprintf(hostname, sizeof(hostname), 
"localhost");
argv_array_pushf(,
"--keep=fetch-pack %"PRIuMAX " on %s",
diff --git a/git-compat-util.h b/git-compat-util.h
index 46f3abe401..bd04564a69 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -888,6 +888,8 @@ extern int xsnprintf(char *dst, size_t max, const char 
*fmt, ...);
 #define HOST_NAME_MAX 256
 #endif
 
+extern int xgethostname(char *buf, size_t len);
+
 /* in ctype.c, for kwset users */
 extern const unsigned char tolower_trans_tbl[256];
 
diff --git a/ident.c b/ident.c
index 556851cf94..bea871c8e0 100644
--- a/ident.c
+++ b/ident.c
@@ -122,7 +122,7 @@ static void add_domainname(struct strbuf *out, int 
*is_bogus)
 {
char buf[HOST_NAME_MAX + 1];
 
-   if (gethostname(buf, sizeof(buf))) {
+   if (xgethostname(buf, sizeof(buf))) {
warning_errno("cannot get host name");
strbuf_addstr(out, "(none)");
*is_bogus = 1;
diff --git a/wrapper.c b/wrapper.c
index 0542fc7582..d837417709 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -655,3 +655,16 @@ void sleep_millisec(int millisec)
 {
poll(NULL, 0, millisec);
 }
+
+int xgethostname(char *buf, size_t len)
+{
+   /*
+* If the full hostname doesn't fit in buf, POSIX does not
+* specify whether the buffer will be null-terminated, so to
+* be safe, do it ourselves.
+*/
+   int ret = gethostname(buf, len);
+   if (!ret)
+   buf[len - 1] = 0;
+   return ret;
+}
-- 
2.11.GIT

[PATCH v3 1/2] use HOST_NAME_MAX to size buffers for gethostname(2)

2017-04-18 Thread David Turner

From: René Scharfe 

POSIX limits the length of host names to HOST_NAME_MAX.  Export the
fallback definition from daemon.c and use this constant to make all
buffers used with gethostname(2) big enough for any possible result
and a terminating NUL.

Inspired-by: David Turner 
Signed-off-by: Rene Scharfe 
Signed-off-by: David Turner 
---
 builtin/gc.c   | 10 +++---
 builtin/receive-pack.c |  2 +-
 daemon.c   |  4 
 fetch-pack.c   |  2 +-
 git-compat-util.h  |  4 
 ident.c|  2 +-
 6 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index c2c61a57bb..4c4a36e2b5 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -238,7 +238,7 @@ static int need_to_gc(void)
 static const char *lock_repo_for_gc(int force, pid_t* ret_pid)
 {
static struct lock_file lock;
-   char my_host[128];
+   char my_host[HOST_NAME_MAX + 1];
struct strbuf sb = STRBUF_INIT;
struct stat st;
uintmax_t pid;
@@ -257,8 +257,12 @@ static const char *lock_repo_for_gc(int force, pid_t* 
ret_pid)
fd = hold_lock_file_for_update(, pidfile_path,
   LOCK_DIE_ON_ERROR);
if (!force) {
-   static char locking_host[128];
+   static char locking_host[HOST_NAME_MAX + 1];
+   static char *scan_fmt;
int should_exit;
+
+   if (!scan_fmt)
+   scan_fmt = xstrfmt("%s %%%dc", "%"SCNuMAX, 
HOST_NAME_MAX);
fp = fopen(pidfile_path, "r");
memset(locking_host, 0, sizeof(locking_host));
should_exit =
@@ -274,7 +278,7 @@ static const char *lock_repo_for_gc(int force, pid_t* 
ret_pid)
 * running.
 */
time(NULL) - st.st_mtime <= 12 * 3600 &&
-   fscanf(fp, "%"SCNuMAX" %127c", , locking_host) == 2 
&&
+   fscanf(fp, scan_fmt, , locking_host) == 2 &&
/* be gentle to concurrent "gc" on remote hosts */
(strcmp(locking_host, my_host) || !kill(pid, 0) || 
errno == EPERM);
if (fp != NULL)
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index aca9c33d8d..2612efad3d 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -1695,7 +1695,7 @@ static const char *unpack(int err_fd, struct shallow_info 
*si)
if (status)
return "unpack-objects abnormal exit";
} else {
-   char hostname[256];
+   char hostname[HOST_NAME_MAX + 1];
 
argv_array_pushl(, "index-pack",
 "--stdin", hdr_arg, NULL);
diff --git a/daemon.c b/daemon.c
index 473e6b6b63..1503e1ed6f 100644
--- a/daemon.c
+++ b/daemon.c
@@ -4,10 +4,6 @@
 #include "strbuf.h"
 #include "string-list.h"
 
-#ifndef HOST_NAME_MAX
-#define HOST_NAME_MAX 256
-#endif
-
 #ifdef NO_INITGROUPS
 #define initgroups(x, y) (0) /* nothing */
 #endif
diff --git a/fetch-pack.c b/fetch-pack.c
index d07d85ce30..055f568775 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -802,7 +802,7 @@ static int get_pack(struct fetch_pack_args *args,
if (args->use_thin_pack)
argv_array_push(, "--fix-thin");
if (args->lock_pack || unpack_limit) {
-   char hostname[256];
+   char hostname[HOST_NAME_MAX + 1];
if (gethostname(hostname, sizeof(hostname)))
xsnprintf(hostname, sizeof(hostname), 
"localhost");
argv_array_pushf(,
diff --git a/git-compat-util.h b/git-compat-util.h
index 8a4a3f85e7..46f3abe401 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -884,6 +884,10 @@ static inline size_t xsize_t(off_t len)
 __attribute__((format (printf, 3, 4)))
 extern int xsnprintf(char *dst, size_t max, const char *fmt, ...);
 
+#ifndef HOST_NAME_MAX
+#define HOST_NAME_MAX 256
+#endif
+
 /* in ctype.c, for kwset users */
 extern const unsigned char tolower_trans_tbl[256];
 
diff --git a/ident.c b/ident.c
index c0364fe3a1..556851cf94 100644
--- a/ident.c
+++ b/ident.c
@@ -120,7 +120,7 @@ static int canonical_name(const char *host, struct strbuf 
*out)
 
 static void add_domainname(struct strbuf *out, int *is_bogus)
 {
-   char buf[1024];
+   char buf[HOST_NAME_MAX + 1];
 
if (gethostname(buf, sizeof(buf))) {
warning_errno("cannot get host name");
-- 
2.11.GIT

[PATCH v3 0/2] gethostbyname fixes

2017-04-18 Thread David Turner

This version includes Junio's fixup to René's patch, and then my patch
rebased on top of René's.  I thought it was easier to just send both
in one series, than to have Junio do a bunch of conflict resolution.
I think this still needs Junio's signoff on the first patch, since
I've added his code.

David Turner (1):
  xgethostname: handle long hostnames

René Scharfe (1):
  use HOST_NAME_MAX to size buffers for gethostname(2)

 builtin/gc.c   | 12 
 builtin/receive-pack.c |  4 ++--
 daemon.c   |  4 
 fetch-pack.c   |  4 ++--
 git-compat-util.h  |  6 ++
 ident.c|  4 ++--
 wrapper.c  | 13 +
 7 files changed, 33 insertions(+), 14 deletions(-)

-- 
2.11.GIT

Re: [PATCH v11 2/5] p0006-read-tree-checkout: perf test to time read-tree

2017-04-18 Thread Thomas Gummerer

On 04/17, g...@jeffhostetler.com wrote:
> From: Jeff Hostetler 
> 
> Created t/perf/repos/many-files.sh to generate large, but
> artificial repositories.
> 
> Created t/perf/p0006-read-tree-checkout.sh to measure
> performance on various read-tree, checkout, and update-index
> operations.  This test can run using either artificial repos
> described above or normal repos.
> 
> Signed-off-by: Jeff Hostetler 
> ---
>  t/perf/p0006-read-tree-checkout.sh |  67 ++
>  t/perf/repos/.gitignore|   1 +
>  t/perf/repos/many-files.sh | 110 
> +
>  3 files changed, 178 insertions(+)
>  create mode 100755 t/perf/p0006-read-tree-checkout.sh
>  create mode 100644 t/perf/repos/.gitignore
>  create mode 100755 t/perf/repos/many-files.sh
> 
> diff --git a/t/perf/p0006-read-tree-checkout.sh 
> b/t/perf/p0006-read-tree-checkout.sh
> new file mode 100755
> index 000..78cc23f
> --- /dev/null
> +++ b/t/perf/p0006-read-tree-checkout.sh
> @@ -0,0 +1,67 @@
> +#!/bin/sh
> +#
> +# This test measures the performance of various read-tree
> +# and checkout operations.  It is primarily interested in
> +# the algorithmic costs of index operations and recursive
> +# tree traversal -- and NOT disk I/O on thousands of files.
> +
> +test_description="Tests performance of read-tree"
> +
> +. ./perf-lib.sh
> +
> +test_perf_default_repo

I like that it's possible to use a real world repository now instead
of forcing the use of a synthetic repository :)

Is there a reason for this being test_perf_default_repo instead of
test_perf_large_repo?  It seems like generating a large repo is what
you are doing with repos/many-files.sh.

> +
> +# If the test repo was generated by ./repos/many-files.sh
> +# then we know something about the data shape and branches,
> +# so we can isolate testing to the ballast-related commits
> +# and setup sparse-checkout so we don't have to populate
> +# the ballast files and directories.
> +#
> +# Otherwise, we make some general assumptions about the
> +# repo and consider the entire history of the current
> +# branch to be the ballast.
> +
> +test_expect_success "setup repo" '
> + if git rev-parse --verify refs/heads/p0006-ballast^{commit}
> + then
> + echo Assuming synthetic repo from many-files.sh
> + git branch br_basemaster
> + git branch br_ballast p0006-ballast^
> + git branch br_ballast_alias   p0006-ballast^
> + git branch br_ballast_plus_1  p0006-ballast
> + git config --local core.sparsecheckout 1
> + cat >.git/info/sparse-checkout <<-EOF
> + /*
> + !ballast/*
> + EOF
> + else
> + echo Assuming non-synthetic repo...
> + git branch br_base$(git rev-list HEAD | tail -n 1)
> + git branch br_ballast HEAD^ || error "no ancestor 
> commit from current head"
> + git branch br_ballast_alias   HEAD^
> + git branch br_ballast_plus_1  HEAD
> + fi &&
> + git checkout -q br_ballast &&
> + nr_files=$(git ls-files | wc -l)
> +'
> +
> +test_perf "read-tree br_base br_ballast ($nr_files)" '
> + git read-tree -m br_base br_ballast -n
> +'
> +
> +test_perf "switch between br_base br_ballast ($nr_files)" '
> + git checkout -q br_base &&
> + git checkout -q br_ballast
> +'
> +
> +test_perf "switch between br_ballast br_ballast_plus_1 ($nr_files)" '
> + git checkout -q br_ballast_plus_1 &&
> + git checkout -q br_ballast
> +'
> +
> +test_perf "switch between aliases ($nr_files)" '
> + git checkout -q br_ballast_alias &&
> + git checkout -q br_ballast
> +'
> +
> +test_done
> diff --git a/t/perf/repos/.gitignore b/t/perf/repos/.gitignore
> new file mode 100644
> index 000..72e3dc3
> --- /dev/null
> +++ b/t/perf/repos/.gitignore
> @@ -0,0 +1 @@
> +gen-*/
> diff --git a/t/perf/repos/many-files.sh b/t/perf/repos/many-files.sh
> new file mode 100755
> index 000..5a1d25e
> --- /dev/null
> +++ b/t/perf/repos/many-files.sh
> @@ -0,0 +1,110 @@
> +#!/bin/sh
> +## Generate test data repository using the given parameters.
> +## When omitted, we create "gen-many-files-d-w-f.git".
> +##
> +## Usage: [-r repo] [-d depth] [-w width] [-f files]
> +##
> +## -r repo: path to the new repo to be generated
> +## -d depth: the depth of sub-directories
> +## -w width: the number of sub-directories at each level
> +## -f files: the number of files created in each directory
> +##
> +## Note that all files will have the same SHA-1 and each
> +## directory at a level will have the same SHA-1, so we
> +## will potentially have a large index, but not a large
> +## ODB.
> +##
> +## Ballast will be created under "ballast/".

I think comments should start only with a single '#' in the git
source, as you already have it in p0006.

[...]

[PATCHv2 3/4] submodule.c: submodule_move_head works with broken submodules

2017-04-18 Thread Stefan Beller

Early on in submodule_move_head just after the check if the submodule is
initialized, we need to check if the submodule is populated correctly.

If the submodule is initialized but doesn't look like it is populated,
this is a red flag and can indicate multiple sorts of failures:
(1) The submodule may be recorded at an object name, that is missing.
(2) The submodule '.git' file link may be broken and it is not pointing
at a repository.

In both cases we want to complain to the user in the non-forced mode,
and in the forced mode ignoring the old state and just moving the
submodule into its new state with a fixed '.git' file link.

Signed-off-by: Stefan Beller 
---
 submodule.c   | 28 
 t/lib-submodule-update.sh | 23 ---
 2 files changed, 44 insertions(+), 7 deletions(-)

diff --git a/submodule.c b/submodule.c
index ccf8932731..20ed5b5681 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1332,10 +1332,24 @@ int submodule_move_head(const char *path,
int ret = 0;
struct child_process cp = CHILD_PROCESS_INIT;
const struct submodule *sub;
+   int *error_code_ptr, error_code;
 
if (!is_submodule_initialized(path))
return 0;
 
+   if (flags & SUBMODULE_MOVE_HEAD_FORCE)
+   /*
+* Pass non NULL pointer to is_submodule_populated_gently
+* to prevent die()-ing. We'll use connect_work_tree_and_git_dir
+* to fixup the submodule in the force case later.
+*/
+   error_code_ptr = _code;
+   else
+   error_code_ptr = NULL;
+
+   if (old && !is_submodule_populated_gently(path, error_code_ptr))
+   return 0;
+
sub = submodule_from_path(null_sha1, path);
 
if (!sub)
@@ -1353,15 +1367,21 @@ int submodule_move_head(const char *path,
absorb_git_dir_into_superproject("", path,
ABSORB_GITDIR_RECURSE_SUBMODULES);
} else {
-   struct strbuf sb = STRBUF_INIT;
-   strbuf_addf(, "%s/modules/%s",
+   char *gitdir = xstrfmt("%s/modules/%s",
get_git_common_dir(), sub->name);
-   connect_work_tree_and_git_dir(path, sb.buf);
-   strbuf_release();
+   connect_work_tree_and_git_dir(path, gitdir);
+   free(gitdir);
 
/* make sure the index is clean as well */
submodule_reset_index(path);
}
+
+   if (old && (flags & SUBMODULE_MOVE_HEAD_FORCE)) {
+   char *gitdir = xstrfmt("%s/modules/%s",
+   get_git_common_dir(), sub->name);
+   connect_work_tree_and_git_dir(path, gitdir);
+   free(gitdir);
+   }
}
 
prepare_submodule_repo_env_no_git_dir(_array);
diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index 22dd9e060c..f0b1b18206 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -1213,14 +1213,31 @@ test_submodule_forced_switch_recursing () {
)
'
# Updating a submodule from an invalid sha1 updates
-   test_expect_success "$command: modified submodule does not update 
submodule work tree from invalid commit" '
+   test_expect_success "$command: modified submodule does update submodule 
work tree from invalid commit" '
prolog &&
reset_work_tree_to_interested invalid_sub1 &&
(
cd submodule_update &&
git branch -t valid_sub1 origin/valid_sub1 &&
-   test_must_fail $command valid_sub1 &&
-   test_superproject_content origin/invalid_sub1
+   $command valid_sub1 &&
+   test_superproject_content origin/valid_sub1 &&
+   test_submodule_content sub1 origin/valid_sub1
+   )
+   '
+
+   # Old versions of Git were buggy writing the .git link file
+   # (e.g. before f8eaa0ba98b and then moving the superproject repo
+   # whose submodules contained absolute paths)
+   test_expect_success "$command: updating submodules fixes .git links" '
+   prolog &&
+   reset_work_tree_to_interested add_sub1 &&
+   (
+   cd submodule_update &&
+   git branch -t modify_sub1 origin/modify_sub1 &&
+   echo "gitdir: bogus/path" >sub1/.git &&
+   $command modify_sub1 &&
+   test_superproject_content origin/modify_sub1 &&
+   test_submodule_content sub1 origin/modify_sub1
)
'
 }
--

[PATCHv2 2/4] submodule.c: uninitialized submodules are ignored in recursive commands

2017-04-18 Thread Stefan Beller

This was an oversight when working on the working tree modifying commands
recursing into submodules.

To test for uninitialized submodules, introduce another submodule
"uninitialized_sub". Adding it via `submodule add` will activate the
submodule in the preparation area (in create_lib_submodule_repo we
setup all the things in submodule_update_repo), but the later tests
will use a new testing repo that clones the preparation repo
in which the new submodule is not initialized.

By adding it to the branch "add_sub1", which is the starting point of
all other branches, we have wide coverage.

Signed-off-by: Stefan Beller 
---
 submodule.c   | 3 +++
 t/lib-submodule-update.sh | 1 +
 2 files changed, 4 insertions(+)

diff --git a/submodule.c b/submodule.c
index 7c3c4b17fb..ccf8932731 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1333,6 +1333,9 @@ int submodule_move_head(const char *path,
struct child_process cp = CHILD_PROCESS_INIT;
const struct submodule *sub;
 
+   if (!is_submodule_initialized(path))
+   return 0;
+
sub = submodule_from_path(null_sha1, path);
 
if (!sub)
diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index fb4f7b014e..22dd9e060c 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -73,6 +73,7 @@ create_lib_submodule_repo () {
 
git checkout -b "add_sub1" &&
git submodule add ../submodule_update_sub1 sub1 &&
+   git submodule add ../submodule_update_sub1 uninitialized_sub &&
git config -f .gitmodules submodule.sub1.ignore all &&
git config submodule.sub1.ignore all &&
git add .gitmodules &&
-- 
2.12.2.642.g1b8cc69eee.dirty

[PATCHv2 4/4] builtin/reset: add --recurse-submodules switch

2017-04-18 Thread Stefan Beller

git-reset is yet another working tree manipulator, which should
be taught about submodules.

One use case of "git-reset" is to reset to a known good state,
and dropping commits that did not work as expected.
In that case one of the expected outcomes from a hard reset
would be to have broken submodules reset to a known good
state as well.  A test for this was added in a prior patch.

Signed-off-by: Stefan Beller 
---
 builtin/reset.c| 30 ++
 t/t7112-reset-submodule.sh |  8 
 2 files changed, 38 insertions(+)

diff --git a/builtin/reset.c b/builtin/reset.c
index fc3b906c47..5ce27fcaed 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -21,6 +21,27 @@
 #include "parse-options.h"
 #include "unpack-trees.h"
 #include "cache-tree.h"
+#include "submodule.h"
+#include "submodule-config.h"
+
+static int recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
+
+static int option_parse_recurse_submodules(const struct option *opt,
+  const char *arg, int unset)
+{
+   if (unset) {
+   recurse_submodules = RECURSE_SUBMODULES_OFF;
+   return 0;
+   }
+   if (arg)
+   recurse_submodules =
+   parse_update_recurse_submodules_arg(opt->long_name,
+   arg);
+   else
+   recurse_submodules = RECURSE_SUBMODULES_ON;
+
+   return 0;
+}
 
 static const char * const git_reset_usage[] = {
N_("git reset [--mixed | --soft | --hard | --merge | --keep] [-q] 
[]"),
@@ -283,6 +304,9 @@ int cmd_reset(int argc, const char **argv, const char 
*prefix)
N_("reset HEAD, index and working tree"), 
MERGE),
OPT_SET_INT(0, "keep", _type,
N_("reset HEAD but keep local changes"), KEEP),
+   { OPTION_CALLBACK, 0, "recurse-submodules", _submodules,
+   "reset", "control recursive updating of submodules",
+   PARSE_OPT_OPTARG, option_parse_recurse_submodules },
OPT_BOOL('p', "patch", _mode, N_("select hunks 
interactively")),
OPT_BOOL('N', "intent-to-add", _to_add,
N_("record only the fact that removed paths 
will be added later")),
@@ -295,6 +319,12 @@ int cmd_reset(int argc, const char **argv, const char 
*prefix)
PARSE_OPT_KEEP_DASHDASH);
parse_args(, argv, prefix, patch_mode, );
 
+   if (recurse_submodules != RECURSE_SUBMODULES_DEFAULT) {
+   gitmodules_config();
+   git_config(submodule_config, NULL);
+   set_config_update_recurse_submodules(RECURSE_SUBMODULES_ON);
+   }
+
unborn = !strcmp(rev, "HEAD") && get_sha1("HEAD", oid.hash);
if (unborn) {
/* reset on unborn branch: treat as reset to empty tree */
diff --git a/t/t7112-reset-submodule.sh b/t/t7112-reset-submodule.sh
index 2eda6adeb1..f86ccdf215 100755
--- a/t/t7112-reset-submodule.sh
+++ b/t/t7112-reset-submodule.sh
@@ -5,6 +5,14 @@ test_description='reset can handle submodules'
 . ./test-lib.sh
 . "$TEST_DIRECTORY"/lib-submodule-update.sh
 
+KNOWN_FAILURE_SUBMODULE_RECURSIVE_NESTED=1
+KNOWN_FAILURE_DIRECTORY_SUBMODULE_CONFLICTS=1
+KNOWN_FAILURE_SUBMODULE_OVERWRITE_IGNORED_UNTRACKED=1
+
+test_submodule_switch_recursing "git reset --recurse-submodules --keep"
+
+test_submodule_forced_switch_recursing "git reset --hard --recurse-submodules"
+
 test_submodule_switch "git reset --keep"
 
 test_submodule_switch "git reset --merge"
-- 
2.12.2.642.g1b8cc69eee.dirty

[PATCHv2 1/4] entry.c: submodule recursing: respect force flag correctly

2017-04-18 Thread Stefan Beller

In case of a non-forced worktree update, the submodule movement is tested
in a dry run first, such that it doesn't matter if the actual update is
done via the force flag. However for correctness, we want to give the
flag as specified by the user. All callers have been inspected and updated
if needed.

Signed-off-by: Stefan Beller 
---
 entry.c| 8 
 unpack-trees.c | 7 ++-
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/entry.c b/entry.c
index d2b512da90..d6b263f78e 100644
--- a/entry.c
+++ b/entry.c
@@ -208,7 +208,8 @@ static int write_entry(struct cache_entry *ce,
sub = submodule_from_ce(ce);
if (sub)
return submodule_move_head(ce->name,
-   NULL, oid_to_hex(>oid), 
SUBMODULE_MOVE_HEAD_FORCE);
+   NULL, oid_to_hex(>oid),
+   state->force ? SUBMODULE_MOVE_HEAD_FORCE : 0);
break;
default:
return error("unknown file mode for %s in index", path);
@@ -282,12 +283,11 @@ int checkout_entry(struct cache_entry *ce,
unlink_or_warn(ce->name);
 
return submodule_move_head(ce->name,
-   NULL, oid_to_hex(>oid),
-   SUBMODULE_MOVE_HEAD_FORCE);
+   NULL, oid_to_hex(>oid), 0);
} else
return submodule_move_head(ce->name,
"HEAD", oid_to_hex(>oid),
-   SUBMODULE_MOVE_HEAD_FORCE);
+   state->force ? 
SUBMODULE_MOVE_HEAD_FORCE : 0);
}
 
if (!changed)
diff --git a/unpack-trees.c b/unpack-trees.c
index 6b7356dab2..4b3f9518e5 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -252,14 +252,18 @@ static int check_submodule_move_head(const struct 
cache_entry *ce,
 const char *new_id,
 struct unpack_trees_options *o)
 {
+   unsigned flags = SUBMODULE_MOVE_HEAD_DRY_RUN;
const struct submodule *sub = submodule_from_ce(ce);
if (!sub)
return 0;
 
+   if (o->reset)
+   flags |= SUBMODULE_MOVE_HEAD_FORCE;
+
switch (sub->update_strategy.type) {
case SM_UPDATE_UNSPECIFIED:
case SM_UPDATE_CHECKOUT:
-   if (submodule_move_head(ce->name, old_id, new_id, 
SUBMODULE_MOVE_HEAD_DRY_RUN))
+   if (submodule_move_head(ce->name, old_id, new_id, flags))
return o->gently ? -1 :
add_rejected_path(o, 
ERROR_WOULD_LOSE_SUBMODULE, ce->name);
return 0;
@@ -308,6 +312,7 @@ static void unlink_entry(const struct cache_entry *ce)
case SM_UPDATE_CHECKOUT:
case SM_UPDATE_REBASE:
case SM_UPDATE_MERGE:
+   /* state.force is set at the caller. */
submodule_move_head(ce->name, "HEAD", NULL,
SUBMODULE_MOVE_HEAD_FORCE);
break;
-- 
2.12.2.642.g1b8cc69eee.dirty

[PATCHv2 0/4] recursive submodules: git-reset!

2017-04-18 Thread Stefan Beller

v2:
* improved commit message to be proper English (Thanks, Philip!)
* clarified why the patch 2 is so short (i.e. it doesn't matter if the submodule
  is initialized in the preparation repo, we care about the actual testing repo!
  Thanks, Brandon)
* reworded patch 1 (Thanks Jonathan)

Thanks,
Stefan

v1: https://public-inbox.org/git/20170411234923.1860-1-sbel...@google.com/

Now that the BIG one has landed, e394fa01d6 (Merge branch
'sb/checkout-recurse-submodules', 2017-03-28), you would expect that
teaching to recurse into submodules is easy for all the remaining 
working tree manipulations?

It turns out it is. See the last patch how we teach git-reset to recurse
into submodules.

However when thinking more about what git-reset is expected to do,
I added tests and some fixes for them (patch 2+3).

patch 1 is a correctness thing, required for patch 3.

Thanks,
Stefan

Stefan Beller (4):
  entry.c: submodule recursing: respect force flag correctly
  submodule.c: uninitialized submodules are ignored in recursive
commands
  submodule.c: submodule_move_head works with broken submodules
  builtin/reset: add --recurse-submodules switch

 builtin/reset.c| 30 ++
 entry.c|  8 
 submodule.c| 31 +++
 t/lib-submodule-update.sh  | 24 +---
 t/t7112-reset-submodule.sh |  8 
 unpack-trees.c |  7 ++-
 6 files changed, 96 insertions(+), 12 deletions(-)

Re: [PATCH] clone: add a --no-tags option to clone without tags

2017-04-18 Thread Ævar Arnfjörð Bjarmason

On Tue, Apr 18, 2017 at 9:15 PM, Ævar Arnfjörð Bjarmason
 wrote:

> N_("clone only one branch, HEAD or --branch")),
> +   OPT_BOOL_NONEG(0, "no-tags", _no_tags,
> +  N_("don't clone any tags, and set 
> remote..tagOpt=--no-tags")),
> OPT_BOOL(0, "shallow-submodules", _shallow_submodules,

I forgot to note that this is on top of my earlier patch which adds
OPT_BOOL_NONEG, see <20170418170914.9701-1-ava...@gmail.com>, but
otherwise applies on top of master, and will work just fine by
amending that to say OPT_BOOL (although --no-no-tags will then exist,
as noted in the other patch).

Re: [PATCH v4 05/10] run-command: prepare child environment before forking

2017-04-18 Thread Brandon Williams

On 04/18, Eric Wong wrote:
> > +static int env_isequal(const char *e1, const char *e2)
> > +{
> > +   for (;;) {
> > +   char c1 = *e1++;
> > +   char c2 = *e2++;
> > +   c1 = (c1 == '=') ? '\0' : tolower(c1);
> > +   c2 = (c2 == '=') ? '\0' : tolower(c2);
> 
> Dealing with C strings scares me so maybe I'm misreading;
> but: why is this comparison case-insensitive?

Well i was pulling inspiration from the stuff in mingw.c...looks like i
probably shouldn't have done so as you're correct, they should be
case-sensitive.  Jonathan pointed out that doing this env stuff in
vanilla C may not be a good idea...and I kinda forgot about that cause
it worked (it passed tests) Let me re-write this section of code to make
it correct, and saner.

-- 
Brandon Williams

[PATCH v1] diffcore-rename: speed up register_rename_src

2017-04-18 Thread git

From: Jeff Hostetler 

Teach register_rename_src() to see if new file pair
can simply be appended to the rename_src[] array before
performing the binary search to find the proper insertion
point.

This is a performance optimization.  This routine is called
during run_diff_files in status and the caller is iterating
over the sorted index, so we should expect to be able to
append in the normal case.  The existing insert logic is
preserved so we don't have to assume that, but simply take
advantage of it if possible.

Using t/perf/p0005-status.h (from a parallel patch series),
we get the following improvement on a 4.2M file repo:

TestHEAD~1   HEAD
0005.2: read-tree status br_ballast (4194305)   59.14(31.85+18.79)   
55.48(28.52+20.71) -6.2%

On a 1M file repo:
TestHEAD~1HEAD
0005.2: read-tree status br_ballast (101)   8.20(4.82+3.35)   
7.91(4.57+3.27) -3.5%

On a smaller repo, like linux.git (58K files), results are masked
by normal I/O variance.

Test  HEAD~1HEAD
0005.2: read-tree status br_ballast (57994)   0.43(0.30+0.13)   0.42(0.31+0.12) 
-2.3%
0005.2: read-tree status br_ballast (57994)   0.42(0.32+0.09)   0.43(0.34+0.10) 
+2.4%
0005.2: read-tree status br_ballast (57994)   0.44(0.33+0.10)   0.42(0.26+0.16) 
-4.5%

Signed-off-by: Jeff Hostetler 
---
 diffcore-rename.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/diffcore-rename.c b/diffcore-rename.c
index f7444c8..543a409 100644
--- a/diffcore-rename.c
+++ b/diffcore-rename.c
@@ -81,6 +81,18 @@ static struct diff_rename_src *register_rename_src(struct 
diff_filepair *p)
 
first = 0;
last = rename_src_nr;
+
+   if (last > 0) {
+   struct diff_rename_src *src = &(rename_src[last-1]);
+   int cmp = strcmp(one->path, src->p->one->path);
+   if (!cmp)
+   return src;
+   if (cmp > 0) {
+   first = last;
+   goto append_it;
+   }
+   }
+
while (last > first) {
int next = (last + first) >> 1;
struct diff_rename_src *src = &(rename_src[next]);
@@ -94,6 +106,7 @@ static struct diff_rename_src *register_rename_src(struct 
diff_filepair *p)
first = next+1;
}
 
+append_it:
/* insert to make it at "first" */
ALLOC_GROW(rename_src, rename_src_nr + 1, rename_src_alloc);
rename_src_nr++;
-- 
2.9.3

[PATCH v1] diffcore-rename speedup

2017-04-18 Thread git

From: Jeff Hostetler 

Here is another micro-optimization for very large repositories.
Speed up register_rename_src() in diffcore-rename.c

Jeff Hostetler (1):
  diffcore-rename: speed up register_rename_src

 diffcore-rename.c | 13 +
 1 file changed, 13 insertions(+)

-- 
2.9.3

Re: [PATCH v1] travis-ci: add static analysis build job to run coccicheck

2017-04-18 Thread Stefan Beller

On Sun, Apr 16, 2017 at 6:31 AM, Sebastian Schuberth
 wrote:
> On 2017-04-11 09:26, Lars Schneider wrote:
>
>> Add a dedicated build job for static analysis. As a starter we only run
>> coccicheck but in the future we could run Clang Static Analyzer or
>> similar tools, too.
>
>
> Just FYI, some time ago someone (I don't recall who) signed up Git with
> Coverity's free scan service for OSS projects:
>
> https://scan.coverity.com/projects/git?tab=overview
>
> Maybe it makes sense to at least link to this page, too?
>

yeah 'someone' is a good description. I accidentally took it over when
it looked stale. Now I have a cron job to upload the pu branch there
each night. (Which is one of the downsides of that service: Git is so
large measured by lines of code, such that we're allowed only one scan
per day. I wonder if that could be circumvented by each contributor having
its own fork, but let's not go that way)

Re: Fwd: Failed shallow submodule clone for fixed SHA1 falls back to checking out HEAD

2017-04-18 Thread Sebastian Schuberth


On 2017-04-18 21:12, Stefan Beller wrote:


Wouldn't it make more sense to unshallow the submodule clone in this case and 
checkout the configured commit 2aa4cce7d7fd46164030f2b4d244c4b89e77f722 
afterwards?


If I remember correctly the conclusion of the discussion at the time
was that people are more interested in "less data" rather than
"correct data" because otherwise you would not have asked for shallow.


I do believe this is an awkward choice. What does it help to get less 
data if it's the wrong data?


--
Sebastian Schuberth

[PATCH] clone: add a --no-tags option to clone without tags

2017-04-18 Thread Ævar Arnfjörð Bjarmason

Add a --no-tags option to "git clone" to clone without tags. Currently
there's no easy way to clone a repository and end up with just a
"master" branch via --single-branch, or track all branches and no
tags. Now --no-tags can be added to "git clone" with or without
--single-branch to clone a repository without tags.

Before this the only way of doing this was either by manually tweaking
the config in a fresh repository:

git init git &&
cat >git/.git/config <
---

On Fri, Apr 14, 2017 at 11:28 PM, Ævar Arnfjörð Bjarmason  
wrote:
> As far as I can tell the only way to clone a given upstream repo,
> which has an unknown main branch name without any tags is:
>
> git clone --single-branch   &&
> cd  &&
> git tag -d $(git tag -l) &&
> git config remote.origin.tagOpt --no-tags
>
> Is there really nothing like:
>
> git clone --single-branch --no-tags  
>
> ?
>
> I suppose this can be done with the usual 'git init`, set the config &
> then fetch dance, but in that case what part of 'git remote' or
> friends exposes finding the remote "main" ref as --single-branch does?

Here's a patch which implements this. It works, and I think it makes
sense for inclusion. It would be logical to follow this up with a
--no-tags-submodules similar to --no-shallow-submodules, but I ran out
of time.

 Documentation/git-clone.txt | 12 +++-
 builtin/clone.c | 13 ++--
 t/t5612-clone-refspec.sh| 73 +++--
 3 files changed, 93 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-clone.txt b/Documentation/git-clone.txt
index 30052cce49..ef78e6dcc6 100644
--- a/Documentation/git-clone.txt
+++ b/Documentation/git-clone.txt
@@ -13,7 +13,7 @@ SYNOPSIS
  [-l] [-s] [--no-hardlinks] [-q] [-n] [--bare] [--mirror]
  [-o ] [-b ] [-u ] [--reference ]
  [--dissociate] [--separate-git-dir ]
- [--depth ] [--[no-]single-branch]
+ [--depth ] [--[no-]single-branch] [--no-tags]
  [--recurse-submodules] [--[no-]shallow-submodules]
  [--jobs ] [--]  []
 
@@ -215,6 +215,16 @@ objects from the source repository into a pack in the 
cloned repository.
branch when `--single-branch` clone was made, no remote-tracking
branch is created.
 
+--no-tags::
+   Don't clone any tags, and set `remote.origin.tagOpt=--no-tags`
+   in the config, ensuring that future `git pull` and `git fetch`
+   operations won't fetch any tags.
++
+Can be used in conjunction with `--single-branch` to clone & maintain
+a branch with no references other than a single cloned branch. This is
+useful e.g. to maintain minimal clones of the default branch of some
+repository for search indexing.
+
 --recurse-submodules[=

Fwd: Failed shallow submodule clone for fixed SHA1 falls back to checking out HEAD

2017-04-18 Thread Stefan Beller

On Tue, Apr 18, 2017 at 6:04 AM, Sebastian Schuberth
 wrote:
> Hi,
>
> this is using "git version 2.12.2.windows.2" on Windows / "git version 
> 2.12.2-639-g584f897" on Linux.
>
> I have configured my superproject with .gitmodules saying
>
> ---8<---
> [submodule "src/funTest/resources/projects/external/jgnash"]
>path = src/funTest/resources/projects/external/jgnash
>url = https://github.com/ccavanaugh/jgnash.git
>shallow = true
> ---8<---
>
> and configured the submodule to checkout commit 
> 2aa4cce7d7fd46164030f2b4d244c4b89e77f722 [1]. When doing a fresh clone of my 
> superproject via "git clone --recursive" I get
>
> ---8<---
> error: Server does not allow request for unadvertised object 
> 2aa4cce7d7fd46164030f2b4d244c4b89e77f722
> Fetched in submodule path 'src/funTest/resources/projects/external/jgnash', 
> but it did not contain 2aa4cce7d7fd464030f2b4d244c4b89e77f722. Direct 
> fetching of that commit failed.
> ---8<---
>
> So far so good, it simply seems that GitHub does not support 
> allowReachableSHA1InWant [2].

Maybe the error message should say so: "Bug your host admin to enable
allowReachableSHA1InWant".

> The interesting bit is that my submodule checkout still ends up being 
> shallow, but poiting to HEAD:
>
> ---8<---
> $ cd src/funTest/resources/projects/external/jgnash
> $ git log
> commit 12036fffb6c620515edd96416363fd1749b5d003 (grafted, HEAD -> master, 
> origin/master, origin/HEAD)
> Author: Craig Cavanaugh 
> Date:   Tue Apr 18 05:33:06 2017 -0400
>
> Fix typos
> ---8<---
>
> Wouldn't it make more sense to unshallow the submodule clone in this case and 
> checkout the configured commit 2aa4cce7d7fd46164030f2b4d244c4b89e77f722 
> afterwards?

If I remember correctly the conclusion of the discussion at the time
was that people are more interested in "less data" rather than
"correct data" because otherwise you would not have asked for shallow.

> At least I'd be getting what I asked for in that case, even if at the cost of 
> additional network traffic. Because if I read [3] correctly, the command line 
> option belonging to "submodule..shallow" is called 
> "--[no-]recommend-shallow", i.e. it's only a recommendation, to falling back 
> to a full clone should be fine.

Heh, good point. Currently that option is more like a
"--[no-]follow-shallow-in-gitmodules" without second thought as the
"recommend" would elude to.

Thanks,
Stefan

>
> [1] 
> https://github.com/ccavanaugh/jgnash/commit/2aa4cce7d7fd46164030f2b4d244c4b89e77f722
> [2] 
> https://git-scm.com/docs/git-config#git-config-uploadpackallowReachableSHA1InWant
> [3] https://git-scm.com/docs/git-submodule
>
> Regards,
> Sebastian
>

Re: [PATCH v9 0/5] [GSoC] remove_subtree(): reimplement using iterators

2017-04-18 Thread Stefan Beller

On Mon, Apr 17, 2017 at 9:09 PM, Junio C Hamano  wrote:
> Daniel Ferreira  writes:
>
>> I think this is the closest to a final version we've ever gotten. I
>> followed all of Michael and Stefan's suggestions on top of v8, and with
>> Michael's endorsement made dir_iterator_begin() return NULL and set
>> errno appropriately in case of an error.
>>
>> On second thought, maybe the extra code complexity required from
>> dir_iterator_begin()'s callers might be actually an advantage as
>> dir_iterator grows to tackle more complex dir traversing challenges on
>> Git. After all, we might want some special behavior depending on what
>> the given `path` is instead of always considering it valid and later
>> behaving as if it was an empty directory.
>>
>> Thanks again for the reviews.
>
> I had a bit of trouble with phrasing here and there, but other than
> that the series was a pleasant read overall.
>
> Will queue, anticipating "Yeah, this is good as the final version"
> comments from reviewers.

yeah, it is.

Thanks,
Stefan

Re: [PATCH v2 2/2] ls-files: fix path used when recursing into submodules

2017-04-18 Thread Stefan Beller

On Mon, Apr 17, 2017 at 7:03 PM, Junio C Hamano  wrote:
> Jacob Keller  writes:
>
>> From: Jacob Keller 
>>
>> Don't assume that the current working directory is the root of the
>> repository. Correctly generate the path for the recursing child
>> processes by building it from the work_tree() root instead. Otherwise if
>> we run ls-files using --git-dir or --work-tree it will not work
>> correctly as it attempts to change directory into a potentially invalid
>> location. Best case, it doesn't exist and we produce an error. Worst
>> case we cd into the wrong location and unknown behavior occurs.
>>
>> Add a new test which highlights this possibility.
>>
>> Signed-off-by: Jacob Keller 
>> ---
>> I'm not sure that I'm convinced by this method of solving the problem as
>> I suspect it has some corner cases (what about when run inside a
>> subdirectory? It seems to work for me but I'm not sure...) Additionally,
>> it felt weird that there's no helper function for creating a toplevel
>> relative path.
>
> Is this a similar issue as discussed in a nearby thread e.g.
>
>   

yes, it's the same broader issue.

>
> I do think it is a bug that sometimes we do not go to the root of
> the working tree when we know the repository is not a bare one.

ok, that would have been my understanding as well.

That nearby thread however highlights how it may be a user breaking if
we just fix the paths. The patch here doesn't cd to the root of the repo,
but constructs the correct path, which is correct. It may be considered
clumsy if we decide that cd'ing to the root of the repo is the correct bug fix
for the underlying issue.

Thanks,
Stefan

RE: [PATCH v2] xgethostname: handle long hostnames

2017-04-18 Thread David Turner

> -Original Message-
> From: René Scharfe [mailto:l@web.de]
> Sent: Tuesday, April 18, 2017 12:08 PM
> To: Junio C Hamano ; David Turner
... 
> >> Of course, my_host is sized to HOST_NAME_MAX + 1 and we are comparing
> >> it with locking_host, so perhaps we'd need to take this version to
> >> size locking_host to also HOST_NAME_MAX + 1, and then scan with %255c
> >> (but then shouldn't we scan with %256c instead?  I am not sure where
> >> these +1 comes from).
> >
> > That is, something along this line...
> >
> >   builtin/gc.c | 6 +-
> >   1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/builtin/gc.c b/builtin/gc.c index be75508292..4f85610d87
> > 100644
> > --- a/builtin/gc.c
> > +++ b/builtin/gc.c
> > @@ -240,7 +240,11 @@ static const char *lock_repo_for_gc(int force, pid_t*
> ret_pid)
> >LOCK_DIE_ON_ERROR);
> > if (!force) {
> > static char locking_host[HOST_NAME_MAX + 1];
> > +   static char *scan_fmt;
> > int should_exit;
> > +
> > +   if (!scan_fmt)
> > +   scan_fmt = xstrfmt("%s %%%dc", "%"SCNuMAX,
> HOST_NAME_MAX);
> > fp = fopen(pidfile_path, "r");
> > memset(locking_host, 0, sizeof(locking_host));
> > should_exit =
> > @@ -256,7 +260,7 @@ static const char *lock_repo_for_gc(int force, pid_t*
> ret_pid)
> >  * running.
> >  */
> > time(NULL) - st.st_mtime <= 12 * 3600 &&
> > -   fscanf(fp, "%"SCNuMAX" %127c", , locking_host)
> == 2 &&
> > +   fscanf(fp, scan_fmt, , locking_host) == 2 &&
> > /* be gentle to concurrent "gc" on remote hosts */
> > (strcmp(locking_host, my_host) || !kill(pid, 0) || errno
> == EPERM);
> > if (fp != NULL)
> >
> 
> How important is it to scan the whole file in one call?  We could split it up 
> like
> this and use a strbuf to handle host names of any length.  We need to be
> permissive here to allow machines with different values for HOST_NAME_MAX
> to work with the same file on a network file system, so this would have to be 
> the
> first patch, right?

If the writer has the smaller HOST_NAME_MAX, this will work fine.  If the reader
has the smaller HOST_NAME_MAX, and the writer's actual value is too long,
then there's no way the strcmp would succeed anyway.  So I don't think we need
to worry about it.

> NB: That && cascade has enough meat for a whole function.

+1

Re: [PATCH] repack: respect gc.pid lock

2017-04-18 Thread Jeff King

On Tue, Apr 18, 2017 at 05:43:29PM +, David Turner wrote:

> > A lock can catch the racy cases where both run at the same time. But I 
> > think that
> > even:
> > 
> >   git -c repack.writeBitmaps=true repack -Ad
> >   [...wait...]
> >   git gc
> > 
> > is questionable, because that gc will erase your bitmaps. How does git-gc 
> > know
> > that it's doing a bad thing by repacking without bitmaps, and that you 
> > didn't
> > simply change your configuration or want to get rid of them?
> 
> Sorry, the gc in Gitlab does keep bitmaps.  The one I quoted in a previous 
> message  doesn't, because the person typing the command was just doing some 
> manual  testing and I guess didn't realize that bitmaps were important.  Or 
> perhaps he knew that repack.writeBitmaps was already set in the config.

Sure, but I guess I'd just wonder what _else_ is different between the
commands (and if nothing, why are both running).

> So given that the lock will catch the races, might it be a good idea (if 
> Implemented to avoid locking on repack -d)?

I'm mildly negative just because it increases complexity, and I don't
think it's actually buying very much. It's not clear to me which
invocations of repack would want to lock and which ones wouldn't.

Is "-a" or "-A" the key factor? Are there current callers who prefer the
current behavior of "possibly duplicate some work, but never report
failure" versus "do not duplicate work, but sometimes fail due to lock
contention"?

-Peff

RE: [PATCH] repack: respect gc.pid lock

2017-04-18 Thread David Turner



> -Original Message-
> From: Jeff King [mailto:p...@peff.net]
> Sent: Tuesday, April 18, 2017 1:20 PM
> To: David Turner 
> Cc: git@vger.kernel.org; christian.cou...@gmail.com; mf...@codeaurora.org;
> jacob.kel...@gmail.com
> Subject: Re: [PATCH] repack: respect gc.pid lock
> 
> On Tue, Apr 18, 2017 at 05:16:52PM +, David Turner wrote:
> 
> > > -Original Message-
> > > From: Jeff King [mailto:p...@peff.net]
> > > Sent: Monday, April 17, 2017 11:42 PM
> > > To: David Turner 
> > > Cc: git@vger.kernel.org; christian.cou...@gmail.com;
> > > mf...@codeaurora.org; jacob.kel...@gmail.com
> > > Subject: Re: [PATCH] repack: respect gc.pid lock
> > >
> > > On Mon, Apr 17, 2017 at 11:29:18PM +, David Turner wrote:
> > >
> > > > We saw this failure in the logs multiple  times (with three
> > > > different shas, while a gc was running):
> > > > April 12, 2017 06:45 -> ERROR -> 'git -c repack.writeBitmaps=true
> > > > repack -A -d
> > > --pack-kept-objects' in [repo] failed:
> > > > fatal: packfile ./objects/pack/pack-[sha].pack cannot be accessed
> > > > Possibly some other repack was also running at the time as well.
> > > >
> > > > My colleague also saw it while manually doing gc (again while
> > > > repacks were likely to be running):
> > >
> > > This is sort of a side question, but...why are you running other
> > > repacks alongside git-gc? It seems like you ought to be doing one or the
> other.
> >
> > But actually, it would be kind of nice if git would help protect us from 
> > doing
> this?
> 
> A lock can catch the racy cases where both run at the same time. But I think 
> that
> even:
> 
>   git -c repack.writeBitmaps=true repack -Ad
>   [...wait...]
>   git gc
> 
> is questionable, because that gc will erase your bitmaps. How does git-gc know
> that it's doing a bad thing by repacking without bitmaps, and that you didn't
> simply change your configuration or want to get rid of them?

Sorry, the gc in Gitlab does keep bitmaps.  The one I quoted in a previous 
message  doesn't, because the person typing the command was just doing some 
manual  testing and I guess didn't realize that bitmaps were important.  Or 
perhaps he knew that repack.writeBitmaps was already set in the config.

So given that the lock will catch the races, might it be a good idea (if 
Implemented to avoid locking on repack -d)?

Re: [PATCH v3 4/4] convert: add "status=delayed" to filter process protocol

2017-04-18 Thread Taylor Blau

On Tue, Apr 18, 2017 at 06:14:36PM +0200, Lars Schneider wrote:
> > Both Git and the filter are going to have to keep these paths in memory
> > somewhere, be that in-process, or on disk. That being said, I can see 
> > potential
> > troubles with a large number of long paths that exceed the memory available 
> > to
> > Git or the filter when stored in a hashmap/set.
> >
> > On Git's side, I think trading that for some CPU time might make sense. If 
> > Git
> > were to SHA1 each path and store that in a hashmap, it would consume more 
> > CPU
> > time, but less memory to store each path. Git and the filter could then 
> > exchange
> > path names, and Git would simply SHA1 the pathname each time it needed to 
> > refer
> > back to memory associated with that entry in a hashmap.
>
> I would be surprised if this would be necessary. If we filter delay 50,000
> files (= a lot!) with a path length of 1000 characters (= very long!) then we
> would use 50MB plus some hashmap data structures. Modern machines should have
> enough RAM I would think...

I agree, and thanks for correcting my thinking here. I ran a simple command to
get the longest path names in a large repository, as:

  $ find . -type f | awk '{ print length($1) }' | sort -r -n | uniq -c

And found a few files close to the 200 character mark as the longest pathnames
in the repository. I think 50k files at 1k bytes per pathname is quite enough
head-room :-).


--
Thanks,
Taylor Blau

Re: [PATCH] repack: respect gc.pid lock

2017-04-18 Thread Jeff King

On Tue, Apr 18, 2017 at 05:16:52PM +, David Turner wrote:

> > -Original Message-
> > From: Jeff King [mailto:p...@peff.net]
> > Sent: Monday, April 17, 2017 11:42 PM
> > To: David Turner 
> > Cc: git@vger.kernel.org; christian.cou...@gmail.com; mf...@codeaurora.org;
> > jacob.kel...@gmail.com
> > Subject: Re: [PATCH] repack: respect gc.pid lock
> > 
> > On Mon, Apr 17, 2017 at 11:29:18PM +, David Turner wrote:
> > 
> > > We saw this failure in the logs multiple  times (with three different
> > > shas, while a gc was running):
> > > April 12, 2017 06:45 -> ERROR -> 'git -c repack.writeBitmaps=true repack 
> > > -A -d
> > --pack-kept-objects' in [repo] failed:
> > > fatal: packfile ./objects/pack/pack-[sha].pack cannot be accessed
> > > Possibly some other repack was also running at the time as well.
> > >
> > > My colleague also saw it while manually doing gc (again while repacks
> > > were likely to be running):
> > 
> > This is sort of a side question, but...why are you running other repacks 
> > alongside
> > git-gc? It seems like you ought to be doing one or the other.
> 
> But actually, it would be kind of nice if git would help protect us from 
> doing this?

A lock can catch the racy cases where both run at the same time. But I
think that even:

  git -c repack.writeBitmaps=true repack -Ad
  [...wait...]
  git gc

is questionable, because that gc will erase your bitmaps. How does
git-gc know that it's doing a bad thing by repacking without bitmaps,
and that you didn't simply change your configuration or want to get rid
of them?

-Peff

Re: [PATCH] repack: respect gc.pid lock

2017-04-18 Thread Jeff King

On Tue, Apr 18, 2017 at 05:08:14PM +, David Turner wrote:

> On 64-bit systems, I think core.packedGitLimit doesn't make a 
> lot of sense. There is plenty of address space.  Why not use it?

That's my gut feeling, too. I'd have a slight worry that the OS's paging
behavior may respond differently if we have more memory mapped. But
that's not based on numbers, just a fear of the unknown. :)

If we have infinite windows anyway, I suspect we could just mmap entire
packfiles and forget about all the window complexity in the first place.
Although IIRC some operating systems take a long time to set up large
mmaps, and we may only need a small part of a large pack.

> I'll ask our git server administrator to adjust core.packedGitLimit
> and turn repacks back on to see if that fixes the issue.

Thanks. Let us know if you get any results, either way.

-Peff

RE: [PATCH] repack: respect gc.pid lock

2017-04-18 Thread David Turner

> -Original Message-
> From: Jeff King [mailto:p...@peff.net]
> Sent: Monday, April 17, 2017 11:42 PM
> To: David Turner 
> Cc: git@vger.kernel.org; christian.cou...@gmail.com; mf...@codeaurora.org;
> jacob.kel...@gmail.com
> Subject: Re: [PATCH] repack: respect gc.pid lock
> 
> On Mon, Apr 17, 2017 at 11:29:18PM +, David Turner wrote:
> 
> > We saw this failure in the logs multiple  times (with three different
> > shas, while a gc was running):
> > April 12, 2017 06:45 -> ERROR -> 'git -c repack.writeBitmaps=true repack -A 
> > -d
> --pack-kept-objects' in [repo] failed:
> > fatal: packfile ./objects/pack/pack-[sha].pack cannot be accessed
> > Possibly some other repack was also running at the time as well.
> >
> > My colleague also saw it while manually doing gc (again while repacks
> > were likely to be running):
> 
> This is sort of a side question, but...why are you running other repacks 
> alongside
> git-gc? It seems like you ought to be doing one or the other.

But actually, it would be kind of nice if git would help protect us from doing 
this?

[PATCH] various: disallow --no-no-OPT for --no-opt options

2017-04-18 Thread Ævar Arnfjörð Bjarmason

Change various --no-OPT options which don't supply PARSE_OPT_NONEG to
make --no-no-OPT an error.

All of these worked before this change, e.g. doing cloning by doing
"git clone --no-no-checkout" is equivalent to just "git clone", but
this was never intended, and is inconsistent with other --no-OPT
options which do pass PARSE_OPT_NONEG.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---

For this one I'm not bothering to track down & CC every single person
who originally added these --no-OPT options. Clearly just a trivial
bug we can fix.

 apply.c   | 2 +-
 builtin/bisect--helper.c  | 2 +-
 builtin/check-ignore.c| 2 +-
 builtin/checkout-index.c  | 2 +-
 builtin/clone.c   | 4 ++--
 builtin/commit.c  | 4 ++--
 builtin/fast-export.c | 2 +-
 builtin/grep.c| 2 +-
 builtin/hash-object.c | 2 +-
 builtin/log.c | 4 ++--
 builtin/push.c| 2 +-
 builtin/read-tree.c   | 2 +-
 builtin/revert.c  | 2 +-
 builtin/show-branch.c | 2 +-
 builtin/update-ref.c  | 2 +-
 parse-options.h   | 7 +++
 t/helper/test-parse-options.c | 1 +
 t/t0040-parse-options.sh  | 3 +++
 18 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/apply.c b/apply.c
index e6dbab26ad..47a91d0762 100644
--- a/apply.c
+++ b/apply.c
@@ -4917,7 +4917,7 @@ int apply_parse_options(int argc, const char **argv,
{ OPTION_CALLBACK, 'p', NULL, state, N_("num"),
N_("remove  leading slashes from traditional diff 
paths"),
0, apply_option_parse_p },
-   OPT_BOOL(0, "no-add", >no_add,
+   OPT_BOOL_NONEG(0, "no-add", >no_add,
N_("ignore additions made by the patch")),
OPT_BOOL(0, "stat", >diffstat,
N_("instead of applying the patch, output diffstat for 
the input")),
diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c
index 3324229025..5fe9093947 100644
--- a/builtin/bisect--helper.c
+++ b/builtin/bisect--helper.c
@@ -15,7 +15,7 @@ int cmd_bisect__helper(int argc, const char **argv, const 
char *prefix)
struct option options[] = {
OPT_BOOL(0, "next-all", _all,
 N_("perform 'git bisect next'")),
-   OPT_BOOL(0, "no-checkout", _checkout,
+   OPT_BOOL_NONEG(0, "no-checkout", _checkout,
 N_("update BISECT_HEAD instead of checking out the 
current commit")),
OPT_END()
};
diff --git a/builtin/check-ignore.c b/builtin/check-ignore.c
index 1d73d3ca3d..cb6467946e 100644
--- a/builtin/check-ignore.c
+++ b/builtin/check-ignore.c
@@ -24,7 +24,7 @@ static const struct option check_ignore_options[] = {
 N_("terminate input and output records by a NUL character")),
OPT_BOOL('n', "non-matching", _non_matching,
 N_("show non-matching input paths")),
-   OPT_BOOL(0, "no-index", _index,
+   OPT_BOOL_NONEG(0, "no-index", _index,
 N_("ignore index when checking")),
OPT_END()
 };
diff --git a/builtin/checkout-index.c b/builtin/checkout-index.c
index 07631d0c9c..cf84e31ce8 100644
--- a/builtin/checkout-index.c
+++ b/builtin/checkout-index.c
@@ -161,7 +161,7 @@ int cmd_checkout_index(int argc, const char **argv, const 
char *prefix)
OPT__FORCE(, N_("force overwrite of existing files")),
OPT__QUIET(,
N_("no warning for existing files and files not in 
index")),
-   OPT_BOOL('n', "no-create", _new,
+   OPT_BOOL_NONEG('n', "no-create", _new,
N_("don't checkout new files")),
OPT_BOOL('u', "index", _opt,
 N_("update stat information in the index file")),
diff --git a/builtin/clone.c b/builtin/clone.c
index de85b85254..32c5843563 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -76,7 +76,7 @@ static struct option builtin_clone_options[] = {
OPT__VERBOSITY(_verbosity),
OPT_BOOL(0, "progress", _progress,
 N_("force progress reporting")),
-   OPT_BOOL('n', "no-checkout", _no_checkout,
+   OPT_BOOL_NONEG('n', "no-checkout", _no_checkout,
 N_("don't create a checkout")),
OPT_BOOL(0, "bare", _bare, N_("create a bare repository")),
OPT_HIDDEN_BOOL(0, "naked", _bare,
@@ -85,7 +85,7 @@ static struct option builtin_clone_options[] = {
 N_("create a mirror repository (implies bare)")),
OPT_BOOL('l', "local", _local,
N_("to clone from a local repository")),
-   OPT_BOOL(0, "no-hardlinks", _no_hardlinks,
+   OPT_BOOL_NONEG(0, "no-hardlinks", _no_hardlinks,
N_("don't use local hardlinks, always copy")),
OPT_BOOL('s', "shared", _shared,
N_("setup as shared

RE: [PATCH] repack: respect gc.pid lock

2017-04-18 Thread David Turner

> -Original Message-
> From: Jeff King [mailto:p...@peff.net]
> Sent: Monday, April 17, 2017 11:42 PM
> To: David Turner 
> Cc: git@vger.kernel.org; christian.cou...@gmail.com; mf...@codeaurora.org;
> jacob.kel...@gmail.com
> Subject: Re: [PATCH] repack: respect gc.pid lock
> 
> On Mon, Apr 17, 2017 at 11:29:18PM +, David Turner wrote:
> 
> > We saw this failure in the logs multiple  times (with three different
> > shas, while a gc was running):
> > April 12, 2017 06:45 -> ERROR -> 'git -c repack.writeBitmaps=true repack -A 
> > -d
> --pack-kept-objects' in [repo] failed:
> > fatal: packfile ./objects/pack/pack-[sha].pack cannot be accessed
> > Possibly some other repack was also running at the time as well.
> >
> > My colleague also saw it while manually doing gc (again while repacks
> > were likely to be running):
> 
> This is sort of a side question, but...why are you running other repacks 
> alongside
> git-gc? It seems like you ought to be doing one or the other.
>
> I don't begrudge anybody with a complicated setup running their own set of gc
> commands, but I'd think you would want to do locking there, and disable auto-
> gc entirely. Otherwise you're going to get different results depending on who
> gc'd last.

That's what gitlab does, so you'll have to ask them why they do it that way.  
From https://gitlab.com/gitlab-org/gitlab-ce/issues/30939#note_27487981
 it looks like they may have intended to have a lock but not quite succeeded.
 
> > $ git gc --aggressive
> > Counting objects: 13800073, done.
> > Delta compression using up to 8 threads.
> > Compressing objects:  99% (11465846/11465971)
> > Compressing objects: 100% (11465971/11465971), done.
> > fatal: packfile [repo]/objects/pack/pack-[sha].pack cannot be accessed
> 
> OK, so this presumably happened during the writing phase. Which seems like the
> "a pack was closed, and we couldn't re-open it" problem we've seen before.
> 
> > We have a reasonable rlimit (64k soft limit), so that failure mode is
> > pretty unlikely.  I  think we should have had 20 or so packs -- not tens of
> thousands.
> > [...]
> > Do you have any idea why this would be happening other than the rlimit 
> > thing?
> 
> Yeah, that should be enough (you could double check the return of
> get_max_fd_limit() on your system if you wanted to be paranoid).
> 
> We also keep only a limited number of bytes mmap'd at one time. Normally we
> don't actually close packfiles when we release their mmap windows.
> But I think there is one path that might. When use_pack() maps a pack, if the
> entire pack fits in a single window, then we close it; this is due to 
> d131b7afe
> (sha1_file.c: Don't retain open fds on small packs, 2011-03-02).
> 
> But if we ever unmap that window, now we have no handle to the pack.
> Normally on a 64-bit system this wouldn't happen at all, since the default
> core.packedGitLimit is 8GB there.

Aha, I missed that limit while messing around with the code.  That must be it.

> So if you have a few small packs and one very large pack (over 8GB), I think 
> this
> could trigger. We may do the small-pack thing for some of them, and then the
> large pack forces us to drop the mmaps for some of the others. When we go
> back to access the small pack, we find it's gone.
> 
> One solution would be to bump core.packedGitLimit to something much higher
> (it's an mmap, so we're really just chewing up address space; it's up to the 
> OS to
> decide when to load pages from disk and when to drop them).
>
> The other alternative is to disable the small-pack closing from d131b7afe. It
> might need to be configurable, or perhaps auto-tuned based on the fd limit.
> Linux systems tend to have generous descriptor limits, but I'm not sure we can
> rely on that. OTOH, it seems like the code to close descriptors when needed
> would take care of things. So maybe we should just revert d131b7afe entirely.

I definitely remember running into fd limits when processing very large numbers 
of packs at Twitter, but I don't recall the exact details.  Presumably, 
d131b7afe
was supposed to help with this, but in fact, it did not totally solve it. 
Perhaps 
we were doing something funny.  Adjusting the fd limits was the easy fix.

On 64-bit systems, I think core.packedGitLimit doesn't make a 
lot of sense. There is plenty of address space.  Why not use it?

For 32-bit systems, of course, address space is more precious.

I'll ask our git server administrator to adjust core.packedGitLimit
and turn repacks back on to see if that fixes the issue.

> The final thing I'd ask is whether you might be on a networked filesystem that
> would foil our usual "open descriptors mean packs don't go away" logic. But
> after having dug into the details above, I have a feeling the answer is 
> simply that
> you have repositories >8GB.

Yes, our repo is >8GB, and no, it's not on a networked filesystem.

> And if that is the case, then yeah, your locking patch is

Re: FW: Issue in gitbash changing directory

2017-04-18 Thread Kevin Daudt

On Tue, Apr 18, 2017 at 12:48:09PM +, Bonk, Gregory wrote:
> 
> I accidently typed 'cd //'  and it worked.
> 
> gbonk@ICC11167 MINGW64 /c/git/mtb-messagehub-information-radiator (master)
> $ cd //
> 
> gbonk@ICC11167 MINGW64 //
> $ cd ..

This has very little to do with git, but more with bash (the shell that
is interpretting the command. From a bash FAQ [1]:

> E10) Why does `cd //' leave $PWD as `//'?
> 
> POSIX.2, in its description of `cd', says that *three* or more leading
> slashes may be replaced with a single slash when canonicalizing the
> current working directory.
> 
> This is, I presume, for historical compatibility.  Certain versions of
> Unix, and early network file systems, used paths of the form
> //hostname/path to access `path' on server `hostname'.

So this is kind of intentional.


[1]: https://tiswww.case.edu/php/chet/bash/FAQ

Re: GIT_EXEC_PATH

2017-04-18 Thread Kevin Daudt

On Tue, Apr 18, 2017 at 01:47:14PM +0200, Christoph Egger wrote:
> Hi!
> 
> Concerning $GIT_EXEC_PATH .. is that supposed to be a $PATh like variable? as 
> in can it have more than one path (colon-separated)? I have currently two 
> directories there (one with a git-annex installation, one with the normal git 
> stuff) and it seems to mostly work. However git-sh-setup is unhappy:
> 
> > % git pull --rebase
> > /opt/local/libexec/git-core/git-sh-setup: line 46: 
> > /opt/local/libexec/git-core:/Applications/git-annex.app/Contents/MacOS//git-sh-i18n:
> >  No such file or directory
> 
>   Christoph

Hello Chistoph,

The exec path is not just $PATH, it should just point to a single
directory.

See [1] for a recent thread about this:

Junio C Hamano wrote:
> That environment variable is designed to hold a single path, not
> like $PATH that lists multiple places in a colon separated list.

https://public-inbox.org/git/CAJF7t-dqSa7tmQqNEWmg_VZ=+832nsz-3jmsga03qk6ay5e...@mail.gmail.com/

Kevin

Re: [PATCH v5 4/8] convert: Separate generic structures and variables from the filter specific ones

2017-04-18 Thread Ben Peart


On 4/16/2017 11:31 PM, Junio C Hamano wrote:

Lars Schneider  writes:


-static struct cmd2process *find_multi_file_filter_entry(struct hashmap 
*hashmap, const char *cmd)
+static struct subprocess_entry *find_multi_file_filter_entry(const char *cmd)

I am curious why you removed the hashmap parameter (here and in other pars of 
this patch).
I know the parameter is not strictly necessary as the hashmap is a global 
variable anyways.
However, I think it eases code maintainability in the long run if a function is 
"as pure as
possible" (IOW does rely on global state as less as possible).

If the original relied on a global hashmap and this update kept the
code like so, I wouldn't mind the end result of this series
(i.e. rely on it being global).  But that is not the case.  It is
making the code worse by stopping passing the hashmap through the
callchain.
I debated this myself but there were a couple of things that pushed me 
to this simpler model.  First, it simplified the interface a little as 
you don't need an explicit init call with state to detect if it has 
already been initialized and you don't have to pass the hashmap into the 
various functions.  Since it was already a global, I didn't feel like 
this made it any worse.


Second, since the hashmap key is the exact command that was executed if 
you ever had two hashmaps with the same key, you'd end up with two 
copies of the same background process running.  I couldn't come up with 
any scenario where you would want two copies of the exact same command 
running at the same time so again went with the simpler model.


While I was typing this, I realized that since the background process is 
a versioned interface, if there were two different parts of the code 
using background processes and if they both wanted to run the same 
background process and if they wanted to use different versions of the 
interface, then you could want two different copies.


That said, I tend to build for things that are needed now rather than 
those that might be needed in the future.  If I've missed a scenario, 
I'm happy to change the interface.

Re: [PATCH v10 3/3] read-cache: speed up add_index_entry during checkout

2017-04-18 Thread Jeff Hostetler




On 4/17/2017 10:53 AM, Jeff Hostetler wrote:



On 4/15/2017 1:55 PM, René Scharfe wrote:

Am 14.04.2017 um 21:12 schrieb g...@jeffhostetler.com:

From: Jeff Hostetler 


Very nice, especially the perf test!  But can we unbundle the different
optimizations into separate patches with their own performance numbers?
Candidates IMHO: The change in add_index_entry_with_check(), the first
hunk in has_dir_name(), the loop shortcuts.  That might also help find
the reason for the slight slowdown of 0006.3 for the kernel repository.


Let me take a look at this and see if it helps.


Last night I pushed up version 11 which has the 3 parts
of read-cache.c in 3 commits (but still in the same patch
series).  This should allow for more experimentation.

The add_index_entry_with_check() shows a gain.  For the
operations in p0006 on linux.git, the short-cut was being
taken 57993 of 57994 times.

The top of has_dir_name() -- by itself -- does not, but
the short-cut only triggers when the paths have no
prefix in common -- which only happens when the top-level
directory changes.  On linux.git, this was 19 of 57993.
However, it does set us up for the next part.

The 3 loop body short-cuts hit 54372, 3509, and 86 (sum
57967) times.  So in p0006, the search was only attempted
7 times (57993 - 19 - 57967) most of the time.


WRT the slowdown of 0006.3 on linux.git, I suspect this is
I/O noise.  In the commit message for part 2 in V11, I
show 2 runs on linux.git that show wide variance in the 0006.3
times.  And given the nature of that test, the speed up in the
lookups is completely hidden by the I/O of the full checkouts.
When I step up to a repo with 4M files, the results are very
clear.

https://public-inbox.org/git/20170417213734.55373-6-...@jeffhostetler.com/

Re: [PATCH v2] xgethostname: handle long hostnames

2017-04-18 Thread Jeff King

On Tue, Apr 18, 2017 at 06:07:43PM +0200, René Scharfe wrote:

> > -   fscanf(fp, "%"SCNuMAX" %127c", , locking_host) == 2 
> > &&
> > +   fscanf(fp, scan_fmt, , locking_host) == 2 &&
> > /* be gentle to concurrent "gc" on remote hosts */
> > (strcmp(locking_host, my_host) || !kill(pid, 0) || 
> > errno == EPERM);
> > if (fp != NULL)
> > 
> 
> How important is it to scan the whole file in one call?  We could split
> it up like this and use a strbuf to handle host names of any length.  We
> need to be permissive here to allow machines with different values for
> HOST_NAME_MAX to work with the same file on a network file system, so
> this would have to be the first patch, right?

I doubt that doing it in one call matters. It's not like stdio promises
us any atomicity in the first place.

> - fscanf(fp, "%"SCNuMAX" %127c", , locking_host) == 2 
> &&
> + fscanf(fp, "%"SCNuMAX" ", ) == 1 &&
> + !strbuf_getwholeline(_host, fp, '\0') &&

I don't think there is anything wrong with using fscanf here, but it has
enough pitfalls in general that I don't really like its use as a parser
(and the general lack of it in Git's code base seems to agree).

I wonder if this should just read a line (or the whole file) into a
strbuf and parse it there. That would better match our usual style, I
think.

I can live with it either way.

> NB: That && cascade has enough meat for a whole function.

Yeah.

-Peff

Re: [PATCH v3 4/4] convert: add "status=delayed" to filter process protocol

2017-04-18 Thread Lars Schneider


> On 12. Apr 2017, at 19:46, Taylor Blau  wrote:
> 
> I think this is a great approach and one that I'd be happy to implement in 
> LFS.
> The additional capability isn't too complex, so I think other similar filters 
> to
> LFS shouldn't have a hard time implementing it either.
> 
> I left a few comments, mostly expressing approval to the documentation 
> changes.
> I'll leave the C review to someone more expert than me.
> 
> +1 from me on the protocol changes.

Thanks!


>> +Delay
>> +^
>> +
>> +If the filter supports the "delay" capability, then Git can send the
>> +flag "delay-able" after the filter command and pathname.
> 
> Nit: I think either way is fine, but `can_delay` will save us 1 byte per each
> new checkout entry.

1 byte is no convincing argument to me but since you are a native speaker I 
trust your "can-delay" suggestion. I prefer dashes over underscores, though, 
for consistency with the rest of the protocol.


>> +"delay-id", a number that identifies the blob, and a flush packet. The
>> +filter acknowledges this number with a "success" status and a flush
>> +packet.
> 
> I mentioned this in another thread, but I'd prefer, if possible, that we use 
> the
> pathname as a unique identifier for referring back to a particular checkout
> entry. I think adding an additional identifier adds unnecessary complication 
> to
> the protocol and introduces a forced mapping on the filter side from id to
> path.

I agree! I answered in the other thread. Let's keep the discussion there.


> Both Git and the filter are going to have to keep these paths in memory
> somewhere, be that in-process, or on disk. That being said, I can see 
> potential
> troubles with a large number of long paths that exceed the memory available to
> Git or the filter when stored in a hashmap/set.
> 
> On Git's side, I think trading that for some CPU time might make sense. If Git
> were to SHA1 each path and store that in a hashmap, it would consume more CPU
> time, but less memory to store each path. Git and the filter could then 
> exchange
> path names, and Git would simply SHA1 the pathname each time it needed to 
> refer
> back to memory associated with that entry in a hashmap.

I would be surprised if this would be necessary. If we filter delay 50,000 
files (= a lot!) with a path length of 1000 characters (= very long!) then we 
would use 50MB plus some hashmap data structures. Modern machines should have 
enough RAM I would think...

Thanks,
Lars

Re: [PATCH v2] xgethostname: handle long hostnames

2017-04-18 Thread René Scharfe

Am 18.04.2017 um 03:41 schrieb Junio C Hamano:
> Junio C Hamano  writes:
> 
>> David Turner  writes:
>>
>>> @@ -250,14 +250,14 @@ static const char *lock_repo_for_gc(int force, pid_t* 
>>> ret_pid)
>>> ...
>>> if (!force) {
>>> -   static char locking_host[128];
>>> +   static char locking_host[HOST_NAME_MAX + 1];
>>> int should_exit;
>>> fp = fopen(pidfile_path, "r");
>>> memset(locking_host, 0, sizeof(locking_host));
>>
>> I compared the result of applying this v2 directly on top of master
>> and applying René's "Use HOST_NAME_MAX"and then applying your v1.
>> This hunk is the only difference.
>>
>> As this locking_host is used like so in the later part of the code:
>>
>>  time(NULL) - st.st_mtime <= 12 * 3600 &&
>>  fscanf(fp, "%"SCNuMAX" %127c", , locking_host) == 2 
>> &&
>>  /* be gentle to concurrent "gc" on remote hosts */
>>  (strcmp(locking_host, my_host) || !kill(pid, 0) || 
>> errno == EPERM);
>>
>> I suspect that turning it to HOST_NAME_MAX + 1 without tweaking
>> the format "%127c" gives us an inconsistent resulting code.

Oh, missed that.  Thanks for catching it!

>> Of course, my_host is sized to HOST_NAME_MAX + 1 and we are
>> comparing it with locking_host, so perhaps we'd need to take this
>> version to size locking_host to also HOST_NAME_MAX + 1, and then
>> scan with %255c (but then shouldn't we scan with %256c instead?  I
>> am not sure where these +1 comes from).
> 
> That is, something along this line...
> 
>   builtin/gc.c | 6 +-
>   1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/builtin/gc.c b/builtin/gc.c
> index be75508292..4f85610d87 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -240,7 +240,11 @@ static const char *lock_repo_for_gc(int force, pid_t* 
> ret_pid)
>  LOCK_DIE_ON_ERROR);
>   if (!force) {
>   static char locking_host[HOST_NAME_MAX + 1];
> + static char *scan_fmt;
>   int should_exit;
> +
> + if (!scan_fmt)
> + scan_fmt = xstrfmt("%s %%%dc", "%"SCNuMAX, 
> HOST_NAME_MAX);
>   fp = fopen(pidfile_path, "r");
>   memset(locking_host, 0, sizeof(locking_host));
>   should_exit =
> @@ -256,7 +260,7 @@ static const char *lock_repo_for_gc(int force, pid_t* 
> ret_pid)
>* running.
>*/
>   time(NULL) - st.st_mtime <= 12 * 3600 &&
> - fscanf(fp, "%"SCNuMAX" %127c", , locking_host) == 2 
> &&
> + fscanf(fp, scan_fmt, , locking_host) == 2 &&
>   /* be gentle to concurrent "gc" on remote hosts */
>   (strcmp(locking_host, my_host) || !kill(pid, 0) || 
> errno == EPERM);
>   if (fp != NULL)
> 

How important is it to scan the whole file in one call?  We could split
it up like this and use a strbuf to handle host names of any length.  We
need to be permissive here to allow machines with different values for
HOST_NAME_MAX to work with the same file on a network file system, so
this would have to be the first patch, right?

NB: That && cascade has enough meat for a whole function.

René

---
 builtin/gc.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index 1fca84c19d..d5e880028e 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -251,10 +251,9 @@ static const char *lock_repo_for_gc(int force, pid_t* 
ret_pid)
fd = hold_lock_file_for_update(, pidfile_path,
   LOCK_DIE_ON_ERROR);
if (!force) {
-   static char locking_host[128];
+   static struct strbuf locking_host = STRBUF_INIT;
int should_exit;
fp = fopen(pidfile_path, "r");
-   memset(locking_host, 0, sizeof(locking_host));
should_exit =
fp != NULL &&
!fstat(fileno(fp), ) &&
@@ -268,9 +267,10 @@ static const char *lock_repo_for_gc(int force, pid_t* 
ret_pid)
 * running.
 */
time(NULL) - st.st_mtime <= 12 * 3600 &&
-   fscanf(fp, "%"SCNuMAX" %127c", , locking_host) == 2 
&&
+   fscanf(fp, "%"SCNuMAX" ", ) == 1 &&
+   !strbuf_getwholeline(_host, fp, '\0') &&
/* be gentle to concurrent "gc" on remote hosts */
-   (strcmp(locking_host, my_host) || !kill(pid, 0) || 
errno == EPERM);
+   (strcmp(locking_host.buf, my_host) || !kill(pid, 0) || 
errno == EPERM);
if (fp != NULL)
fclose(fp);
if (should_exit) {
@@ -278,7 +278,7 @@ static const char

Re: [PATCH v1 0/3] travis-ci: build docs with asciidoctor

2017-04-18 Thread Lars Schneider


> On 18. Apr 2017, at 12:44, brian m. carlson  
> wrote:
> 
>> On Tue, Apr 18, 2017 at 10:32:59AM +0200, Lars Schneider wrote:
>> 
>>> On 14. Apr 2017, at 00:41, Junio C Hamano  wrote:
>>> Having said that, I wonder if we get some interesting results out of
>>> building the documentation twice, though.  By looking at the Travis
>>> log with timestamps, we probably can see how long each build takes,
>>> but that is much less interesting than learning if new versions of
>>> text used mark-up that does not format correctly on one or the other
>>> (i.e. catch documentation breakage early in each CI run), for
>>> example.  I have an impression that neither AsciiDoc nor AsciiDoctor
>>> "fails" in an obvious way that "make" can notice (i.e. they often
>>> just silently produce nonsense output when fed a malformed input
>>> instead).
>> 
>> True! But wouldn't we get a syntax check here? Wouldn't asciidoc / 
>> ascidoctor bark if we use wrong/unsupported elements?
> 
> Asciidoctor isn't very strict about questionable items.  If you want
> that behavior, you'd want to check for output to standard error during
> the make process, as Asciidoctor uses Ruby's warn function.

That sounds good. I'll check stderr in the next iteration!

Thanks,
Lars

Re: [PATCH] p0004: make perf test executable

2017-04-18 Thread Jeff Hostetler




On 4/18/2017 10:24 AM, Christian Couder wrote:

It looks like in 89c3b0ad43 (name-hash: add perf test for lazy_init_name_hash,
2017-03-23) p0004 was not created with the execute unix rights.
Let's fix that.

Signed-off-by: Christian Couder 
---
 t/perf/p0004-lazy-init-name-hash.sh | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 mode change 100644 => 100755 t/perf/p0004-lazy-init-name-hash.sh

diff --git a/t/perf/p0004-lazy-init-name-hash.sh 
b/t/perf/p0004-lazy-init-name-hash.sh
old mode 100644
new mode 100755




Thanks!
Jeff

no MERGE_HEAD after octopus merge failure

2017-04-18 Thread Max Ivanov

Hi All,

I am using git 2.12.0 and it leaves no MERGE_HEAD once octopus merge
failed with conflicts. Is it intentional? Files have conflicts markers
and once resolved `git commit` creates just a normal commit, which is
very inconvenient and confusing.

Thanks

[PATCH] p0004: make perf test executable

2017-04-18 Thread Christian Couder

It looks like in 89c3b0ad43 (name-hash: add perf test for lazy_init_name_hash,
2017-03-23) p0004 was not created with the execute unix rights.
Let's fix that.

Signed-off-by: Christian Couder 
---
 t/perf/p0004-lazy-init-name-hash.sh | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 mode change 100644 => 100755 t/perf/p0004-lazy-init-name-hash.sh

diff --git a/t/perf/p0004-lazy-init-name-hash.sh 
b/t/perf/p0004-lazy-init-name-hash.sh
old mode 100644
new mode 100755
-- 
2.12.2.635.gf331bb6d3a.dirty

[PATCH] p3400: add perf tests for rebasing many changes

2017-04-18 Thread Christian Couder

Rebasing onto many changes is interesting, but it's also
interesting to see what happens when rebasing many changes.

And while at it, let's also look at the impact of using a
split index.

Signed-off-by: Christian Couder 
---
 t/perf/p3400-rebase.sh | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/t/perf/p3400-rebase.sh b/t/perf/p3400-rebase.sh
index b3e7d525d2..ce271ca4c1 100755
--- a/t/perf/p3400-rebase.sh
+++ b/t/perf/p3400-rebase.sh
@@ -5,7 +5,7 @@ test_description='Tests rebase performance'
 
 test_perf_default_repo
 
-test_expect_success 'setup' '
+test_expect_success 'setup rebasing on top of a lot of changes' '
git checkout -f -b base &&
git checkout -b to-rebase &&
git checkout -b upstream &&
@@ -33,4 +33,24 @@ test_perf 'rebase on top of a lot of unrelated changes' '
git rebase --onto base HEAD^
 '
 
+test_expect_success 'setup rebasing many changes without split-index' '
+   git config core.splitIndex false &&
+   git checkout -b upstream2 to-rebase &&
+   git checkout -b to-rebase2 upstream
+'
+
+test_perf 'rebase a lot of unrelated changes without split-index' '
+   git rebase --onto upstream2 base &&
+   git rebase --onto base upstream2
+'
+
+test_expect_success 'setup rebasing many changes with split-index' '
+   git config core.splitIndex true
+'
+
+test_perf 'rebase a lot of unrelated changes with split-index' '
+   git rebase --onto upstream2 base &&
+   git rebase --onto base upstream2
+'
+
 test_done
-- 
2.12.2.635.gf331bb6d3a.dirty

[PATCH] completion: expand "push --delete " for refs on that

2017-04-18 Thread Ævar Arnfjörð Bjarmason

Change the completion of "push --delete  " to complete
refs on that , not all refs. Before this e.g. cloning git.git
and doing "git push --delete origin p" will complete nothing,
whereas origin/p will uselessly complete origin/pu.

Now p will complete as "pu". The completion of giving --delete
later, e.g. "git push origin --delete p" remains unchanged, this
is a bug, but is a general existing limitation of the bash completion,
and not how git-push is documented, so I'm not fixing that case.

I looked over t9902-completion.sh but couldn't quickly find out how to
add a test for this, but all the existing tests pass, and all my
manual testing of "git push --delete  ..." does the right
thing now.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 contrib/completion/git-completion.bash | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index 1150164d5c..2e5b3ed776 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -701,7 +701,7 @@ __git_complete_revlist ()
 __git_complete_remote_or_refspec ()
 {
local cur_="$cur" cmd="${words[1]}"
-   local i c=2 remote="" pfx="" lhs=1 no_complete_refspec=0
+   local i c=2 remote="" pfx="" lhs=1 no_complete_refspec=0 delete=0
if [ "$cmd" = "remote" ]; then
((c++))
fi
@@ -709,6 +709,7 @@ __git_complete_remote_or_refspec ()
i="${words[c]}"
case "$i" in
--mirror) [ "$cmd" = "push" ] && no_complete_refspec=1 ;;
+   --delete) delete=1 ;;
--all)
case "$cmd" in
push) no_complete_refspec=1 ;;
@@ -761,7 +762,9 @@ __git_complete_remote_or_refspec ()
fi
;;
push)
-   if [ $lhs = 1 ]; then
+   if [ $delete = 1 ]; then
+   __git_complete_refs --remote="$remote" --pfx="$pfx" 
--cur="$cur_"
+   elif [ $lhs = 1 ]; then
__git_complete_refs --pfx="$pfx" --cur="$cur_"
else
__git_complete_refs --remote="$remote" --pfx="$pfx" 
--cur="$cur_"
-- 
2.11.0

Failed shallow submodule clone for fixed SHA1 falls back to checking out HEAD

2017-04-18 Thread Sebastian Schuberth

Hi,

this is using "git version 2.12.2.windows.2" on Windows / "git version 
2.12.2-639-g584f897" on Linux.

I have configured my superproject with .gitmodules saying

---8<---
[submodule "src/funTest/resources/projects/external/jgnash"]
   path = src/funTest/resources/projects/external/jgnash
   url = https://github.com/ccavanaugh/jgnash.git
   shallow = true
---8<---

and configured the submodule to checkout commit 
2aa4cce7d7fd46164030f2b4d244c4b89e77f722 [1]. When doing a fresh clone of my 
superproject via "git clone --recursive" I get

---8<---
error: Server does not allow request for unadvertised object 
2aa4cce7d7fd46164030f2b4d244c4b89e77f722
Fetched in submodule path 'src/funTest/resources/projects/external/jgnash', but 
it did not contain 2aa4cce7d7fd464030f2b4d244c4b89e77f722. Direct fetching of 
that commit failed.
---8<---

So far so good, it simply seems that GitHub does not support 
allowReachableSHA1InWant [2]. The interesting bit is that my submodule checkout 
still ends up being shallow, but poiting to HEAD:

---8<---
$ cd src/funTest/resources/projects/external/jgnash
$ git log
commit 12036fffb6c620515edd96416363fd1749b5d003 (grafted, HEAD -> master, 
origin/master, origin/HEAD)
Author: Craig Cavanaugh 
Date:   Tue Apr 18 05:33:06 2017 -0400

Fix typos
---8<---

Wouldn't it make more sense to unshallow the submodule clone in this case and 
checkout the configured commit 2aa4cce7d7fd46164030f2b4d244c4b89e77f722 
afterwards? At least I'd be getting what I asked for in that case, even if at 
the cost of additional network traffic. Because if I read [3] correctly, the 
command line option belonging to "submodule..shallow" is called 
"--[no-]recommend-shallow", i.e. it's only a recommendation, to falling back to 
a full clone should be fine.

[1] 
https://github.com/ccavanaugh/jgnash/commit/2aa4cce7d7fd46164030f2b4d244c4b89e77f722
[2] 
https://git-scm.com/docs/git-config#git-config-uploadpackallowReachableSHA1InWant
[3] https://git-scm.com/docs/git-submodule

Regards,
Sebastian

FW: Issue in gitbash changing directory

2017-04-18 Thread Bonk, Gregory


I accidently typed 'cd //'  and it worked.

gbonk@ICC11167 MINGW64 /c/git/mtb-messagehub-information-radiator (master)
$ cd //

gbonk@ICC11167 MINGW64 //
$ cd ..

GIT_EXEC_PATH

2017-04-18 Thread Christoph Egger

Hi!

Concerning $GIT_EXEC_PATH .. is that supposed to be a $PATh like variable? as 
in can it have more than one path (colon-separated)? I have currently two 
directories there (one with a git-annex installation, one with the normal git 
stuff) and it seems to mostly work. However git-sh-setup is unhappy:

> % git pull --rebase
> /opt/local/libexec/git-core/git-sh-setup: line 46: 
> /opt/local/libexec/git-core:/Applications/git-annex.app/Contents/MacOS//git-sh-i18n:
>  No such file or directory

  Christoph

Re: [PATCH] Documentation/git-checkout: make doc. of checkout clearer

2017-04-18 Thread Christoph Michelbach

On Mon, 2017-04-17 at 17:31 -0700, Junio C Hamano wrote:
> "Philip Oakley"  writes:
> 
> > 
> > > 
> > > > 
> > > > If we'd created and added a file d just before the checkout,
> > > > what
> > > > should
> > > > have happened to d, and why?
> > > I understand what the command does. It behaves perfectly as I
> > > expected
> > > it to. I did not find this script but wrote it to demonstrate that
> > > what
> > > the documentation says is different from how it behaves after
> > > having
> > > read what the documentation says it should do and noticing that
> > > that's
> > > not how I expected it to work from experience.
> > > 
> > > What it really does is to copy all files described by the given
> > > paths
> > > from the given tree-ish to the working directory. Or at least
> > > that's my
> > > expectation of what it does.
> > > 
> > > The documentation, however, says that the given paths are
> > > *restored*.
> > > This is different.
> > I don't see that difference in the phrase *restored*, compared to
> > your
> > 'copy all files described by the given paths'. Could you explain a
> > little more?
> I am obviously not Christoph, and I was the one that defined how
> "checkout  -- " should work, but when you say
> "restore" (which is not what I wrote ;-)) it is fair to expect lack
> of 'd' could also be "restored", in addition to path that was in the
> directory.

Yes, you didn't write "restore" but you wrote: "It updates the named
paths in the working tree from the index file or from a named  
(most often a commit)."

I suppose that from reading this, some people will assume that `d` is
gone after the command was executed because the folder to which the
given path leads is updated "in the working tree from the index file or
from a named ", and others will not be sure what happens. But
I doubt that a lot of people guess the behavior correctly and are sure
about them being correct.

Note that for the first group of people, it doesn't matter that git
doesn't track folders. You can restore a folder "in the working tree
from the index file or from a named " without tracking folders
because there only is one property of a folder git cares about, apart
from it itself being able to access files in that folder: Its name. And
it can get that name from the paths of the files in that folder. If
there are no files in that folder, there isn't a contradiction, either,
because then there is no folder in the index or tree-ish which can
possibly be restored.

> Obviously, "grab all paths that match  out of , add
> them to the index and copy them out to the working tree" will never
> be able to _restore_ the lack of 'd', even it may match the
>  being used to do this checkout, by removing it from the
> current index and the working tree.

Exactly. "grab all paths that match  out of , add them
to the index and copy them out to the working tree" is a more accurate
description of what the command does (but it might need some rewording
;-) ).

Re: [PATCH] Documentation/git-checkout: make doc. of checkout clearer

2017-04-18 Thread Christoph Michelbach

On Mon, 2017-04-17 at 17:26 -0700, Junio C Hamano wrote:
> "Philip Oakley"  writes:
> 
> > 
> > I'd guess that the misunderstanding is that you maybe thought that
> > the
> > whole directory would be reset to it's old state and the files b and
> > c
> > deleted, rather than just the named files present in that old commit
> > being extracted. If we'd created and added a file d just before the
> > checkout, what should have happened to d, and why?
> It probably is a bit unfair to call it "misunderstanding".  I've had
> this entry in the "Leftover Bits" list for quite some time:
> 
> git checkout $commit -- somedir may want to remove somedir/file
> that
> is not in $commit but is in the original index. Anybody who wants
> to
> do this needs to consider ramifications and devise transition
> plans.
> Cf. $gmane/234935
> 
> In the thread, this message:
> 
> https://public-inbox.org/git/xmqqeh8nxltc@gitster.dls.corp.google.
> com/
> 
> may be a good summary.

For clarification: I'm not saying that that's how the command should
behave. I merely want the documentation to describe how the command
actually behaves.

Of course, being able to reset a folder to a previous state would be
great, too. I have needed that several times and what I ended up doing
is to delete the folder and then do the checkout. This doesn't cause
problems if there are no untracked files (as of the state one wants to
restore; what's currently tracked is of course irrelevant) you still
need in that folder. If that's the case, one must make sure to not
delete those files.

I don't know whether there's an easier/better way, though. That's merely
what I did.

Having the *option* to reset an entire folder rather than the files in
it would be great.

--
Christoph

Re: [PATCH] Documentation/git-checkout: make doc. of checkout clearer

2017-04-18 Thread Christoph Michelbach

On Mon, 2017-04-17 at 21:59 +0100, Philip Oakley wrote:
> I've added back the list, as it was accidentally dropped.

Thanks. I'm sorry. I apparently pressed the wrong button in my emails
client. What I wrote is visible in your quote.

> From: "Christoph Michelbach" 
> > I understand what the command does. It behaves perfectly as I
> > expected
> > it to. I did not find this script but wrote it to demonstrate that
> > what
> > the documentation says is different from how it behaves after having
> > read what the documentation says it should do and noticing that
> > that's
> > not how I expected it to work from experience.
> > 
> > What it really does is to copy all files described by the given
> > paths
> > from the given tree-ish to the working directory. Or at least that's
> > my
> > expectation of what it does.
> > 
> > The documentation, however, says that the given paths are
> > *restored*.
> > This is different.
> I don't see that difference in the phrase *restored*, compared to your
> 'copy 
> all files described by the given paths'. Could you explain a little
> more?

Suppose you have X. If you say X is restored to what it was half an hour
ago, I expect everything about X to be exactly as it was half an hour
ago. So if X is a file containing the text "Hello, world!", I expect
there to be a file (with the exact same file name) which contains the
text "Hello, world!", even if I changed that text in the mean time or
deleted the file entirely. If X is a folder which contains files, I
expect it to have the exact same contents as before. If there were 5
files in the folder half an hour ago, I expect there to be 5 files with
the same file names and the same content in each file, again, even if I
deleted, renamed, changed the contents of, or added files in the mean
time. This is, however, not the case. Suppose I renamed a file from
"foo" to "bar". Then there are 6 files, not 5. This is not how X was
half an hour ago.

If you, however, say that all files in the path X are copied back from
what they were half an hour ago, if X is a file, I expect the exact same
thing as before for it. And that's actually what happens in reality. But
if X is a folder, there's a difference: Because only the *files X
contains* are copied back – *not X itself* –, all the files in X which
do not occur in the state from half an hour ago, are unaffected. So now
I expect there to be a 6th file because there is no file "bar" in the
state from half an hour ago which could overwrite the file "bar" in the
working tree.

> > Yeah, definitely. There should be 2 separate entries for this.
> > 
> > I think someone thought it was a good idea to make `...`
> > optional in the synopsis because `git checkout` behaves in that
> > special
> > way if a patch *or* paths are given. But then, of course, with both
> > `-
> > p|--patch` and `...` optional, the command is the same as
> > the
> > first variation, just with optional parameters -- but the
> > documentation
> > (correctly) says those variations should behave very differently
> > from
> > each other.
> > 
> > I don't see how this can be avoided without having 2 separate
> > entries
> > for those cases.
> true.

So now we have to find someone who used that command with both a patch
and a tree-ish present a lot because I have no idea how that would
behave and don't even know why you would use it.

--
Christoph

Re: [PATCH 0/7] PREVIEW: Introduce DC_AND_OPENSSL_SHA1 make flag

2017-04-18 Thread Johannes Schindelin

Hi Junio,

On Wed, 29 Mar 2017, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> > The approach I chose instead was to make the switch global, per
> > command.  Obviously, the next step is to identify the Git commands
> > which accept objects from external sources (`clone`, `fetch`,
> > `fast-import` and all the remote helpers come to mind) and let them
> > set a global flag before asking git_default_config() to parse the
> > core.enableSHA1DC setting, so that the special value `external` could
> > trigger the collision detecting code for those, and only those,
> > commands. That would still err on the safe side if these commands are
> > used to hash objects originating from the same machine, but that is a
> > price I would be willing to pay for the simplicity of this approach.
> >
> > Does my explanation manage to illustrate in a plausible way why I
> > chose the approach that I did?
> 
> I agree that there may be rooms to tweak the degree of paranoia per
> codepath (I said that already in the message you are responding to),
> but as Linus and Peff already said in the old discussion thread
> [*3*], I doubt that it needs to be runtime configurable.

My real world scenario here was intended to cast a spell on your doubts:
the very real slow down is very inacceptable.

Given your reluctance to accept that the performance impact is large
enough to be painful, and that there are situations where the trust in
your infrastructure and your colleagues makes that pain unnecessary, you
leave me no other option than to make this a Git for Windows-only feature.

I really tried to reconcile your concerns with the needs of Git for
Windows' users, to demonstrate that the compile-time switch is
unacceptable, but it is time to accept defeat and the newly acquired
maintenance burden.

Moving on,
Dscho

Re: [PATCH 2/7] Makefile: optionally compile with both SHA1DC and SHA1_OPENSSL

2017-04-18 Thread Johannes Schindelin

Hi Junio,

On Thu, 30 Mar 2017, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> > +#ifdef SHA1_DC_AND_OPENSSL
> > +void (*SHA1_Init_func)(SHA_CTX_union *ctx) = (void *)SHA1DCInit;
> > +void (*SHA1_Update_func)(SHA_CTX_union *ctx, const void *pointer, size_t 
> > size) =
> > +   (void *)git_SHA1DCUpdate;
> > +int (*SHA1_Final_func)(unsigned char sha1[20], SHA_CTX_union *ctx) =
> > +   (void *)git_SHA1DCFinal;
> > +
> > +void toggle_sha1dc(int enable)
> > +{
> > +   if (enable) {
> > +   SHA1_Init_func = (void *)SHA1DCInit;
> > +   SHA1_Update_func = (void *)git_SHA1DCUpdate;
> > +   SHA1_Final_func = (void *)git_SHA1DCFinal;
> > +   } else {
> > +   SHA1_Init_func = (void *)SHA1_Init;
> > +   SHA1_Update_func = (void *)SHA1_Update;
> > +   SHA1_Final_func = (void *)SHA1_Final;
> > +   }
> > +}
> > +#endif
> 
> As I understand that this is a demonstration series, the approach
> above is OK as an expedite way to illustrate one way how run-time
> switching could be done.  The approach however is not very thread
> friendly, though.

Indeed. However, the toggle is meant to be coarse, heavy-handed. I have to
protect my time as Git for Windows maintainer, so I have to keep this as
simple to maintain as possible while also benefitting my users. This
toggle is intended to be run very, very early in the cmd_main() functions.
As in: right when the config is read.

> > diff --git a/sha1dc/sha1.h b/sha1dc/sha1.h
> > index bd8bd928fb3..243c2fe0b6b 100644
> > --- a/sha1dc/sha1.h
> > +++ b/sha1dc/sha1.h
> > @@ -110,10 +110,26 @@ void git_SHA1DCFinal(unsigned char [20], SHA1_CTX *);
> >   */
> >  void git_SHA1DCUpdate(SHA1_CTX *ctx, const void *data, unsigned long len);
> >  
> > +#ifdef SHA1_DC_AND_OPENSSL
> > +extern void toggle_sha1dc(int enable);
> > +
> > +typedef union {
> > +   SHA1_CTX dc;
> > +   SHA_CTX openssl;
> > +} SHA_CTX_union;
> 
> The use of union is a good ingredient for a solution.  I would have
> chosen to do this slightly differently if I were doing it.
> 
> typedef struct {
> int safe;
> union {
> SHA1_CTX_SAFE safe;
> SHA1_CTX_FAST fast;
> } u;
> } git_SHA_CTX;
> 
> void git_SHA1_Init(git_SHA_CTX *ctx, int safe);
>   void git_SHA1_Update(git_SHA_CTX *ctx, const void *, unsigned long);
>   git_SHA1_Final(uchar [20], git_SHA_CTX *ctx);
> 
> where SHA1_CTX_FAST may be chosen from the Makefile just like we
> currently choose platform_SHA_CTX.  SHA1_CTX_SAFE could also be made
> configurable but it may be OK to hardcode it to refer to SHA1_CTX of
> DC's.
> 
> As you already know, I am assuming that each codepath pretty much
> knows if it needs safe or fast one (e.g. the one used in csum-file.c
> knows it does not have to), so each git_SHA_CTX is told which one to
> use when it gets initialized.

Thanks, at this stage it is pretty clear, though, that I will have to
maintain this. I am not willing to make this as configurable as you
suggested, as these patches will have to stay in git-for-windows/git.

Ciao,
Dscho

Re: Feature request: --format=json

2017-04-18 Thread demerphq

On 18 April 2017 at 10:44, Fred .Flintstone  wrote:
> Well the easiest way to work with that would be JSON.
> So the best would be if Git could output the data I want in JSON format.
> Then it would be easy for me to work with data.
>
> With git rev-list and git-cat file, its not so easy to reliably parse
> that output.

Doesn't seem too hard to work with rev-list to me. As far as I can
tell the following produces what  you want. You need perl installed
obviously, and the JSON::PP module is required, but should come
bundled with recent perls.

git rev-list master --pretty=raw | perl -MJSON::PP=encode_json
-ane'if(/^(\w+) (.*)/) { if ($1 eq "commit") { push @objs, $o if $o;
$o={}; } $o->{$1} = $2; } else { $o->{text} .= $_; } END{ push @objs,
$o if $o; for $o (@objs) { s/^//mg, s/^\n// for $o->{text};
($o->{message},$o->{long_message})= split /\n\n/, delete $o->{text}; }
print JSON::PP->new->pretty->encode(\@objs);}'

You might consider an alternative approach than stating that working
with JSON is "the easiest", especially to people who clearly are
making do without it. :-)

A better argument might be that exposing data through a well defined
and widely used and simple data format would trivially expand the
range of projects that might interoperate with git or enhance the git
ecosystem. For instance you could argue that having clean JSON output
would make it easier to integrate into search engines and other
indexing tools that already know how to speak JSON. Maybe a regular
contributor on this list might agree with your arguments and make it
happen.

Until then you can parse rev-list like the rest of us. :-)

cheers,
Yves

Re: [PATCH v1 0/3] travis-ci: build docs with asciidoctor

2017-04-18 Thread brian m. carlson

On Tue, Apr 18, 2017 at 10:32:59AM +0200, Lars Schneider wrote:
> 
> > On 14. Apr 2017, at 00:41, Junio C Hamano  wrote:
> > Having said that, I wonder if we get some interesting results out of
> > building the documentation twice, though.  By looking at the Travis
> > log with timestamps, we probably can see how long each build takes,
> > but that is much less interesting than learning if new versions of
> > text used mark-up that does not format correctly on one or the other
> > (i.e. catch documentation breakage early in each CI run), for
> > example.  I have an impression that neither AsciiDoc nor AsciiDoctor
> > "fails" in an obvious way that "make" can notice (i.e. they often
> > just silently produce nonsense output when fed a malformed input
> > instead).
> 
> True! But wouldn't we get a syntax check here? Wouldn't asciidoc / ascidoctor 
> bark if we use wrong/unsupported elements?

Asciidoctor isn't very strict about questionable items.  If you want
that behavior, you'd want to check for output to standard error during
the make process, as Asciidoctor uses Ruby's warn function.

One of the things I've wanted to do is to make it able to be strict, but
that requires a lot of refactoring work that I just haven't gotten
around to.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature

Re: Feature request: --format=json

2017-04-18 Thread Fred .Flintstone

But a repository or branch can have thousands of commits, so running
`git commit-file ` seems maybe not be a wide idea.
But parsing `git cat-file --batch` is difficult, because there seems
to be no reliable way to discern when a commit starts and ends.

I don't code in C though. A JSON formatter option would need a JSON library.
But maybe there should be raised a discussion about JSON in Git if
there are other people interested in this?

On Tue, Apr 18, 2017 at 11:39 AM, Samuel Lijin  wrote:
> If for some reason your use case is so performance intensive that you
> can't just `git cat-file commit` every entry in `git rev-list --all`
> individually, then you can also pipe input into `git cat-file --batch`
> and read output as you pipe input in, which will give you a very
> simple mechanism for delimiting the cat-file output.
>
> In any case, as developers, it's rare to have our job done for us.
> That's why we write code.
>
> I'm sure people would be happy to help if you submitted patches to
> support --format=json.
>
> On Tue, Apr 18, 2017 at 3:44 AM, Fred .Flintstone  wrote:
>> Well the easiest way to work with that would be JSON.
>> So the best would be if Git could output the data I want in JSON format.
>> Then it would be easy for me to work with data.
>>
>> With git rev-list and git-cat file, its not so easy to reliably parse
>> that output.
>>
>> On Tue, Apr 18, 2017 at 2:38 AM, Junio C Hamano  wrote:
>>> "Fred .Flintstone"  writes:
>>>
 So I would either have to do:
 git rev-list --all
 Then iterate over each line and do git-cat-file commit .

 Or do:
 git rev-list --all | git cat-file --batch

 If I do it in a batch, then it will be tricky to reliably parse since
 I don't know when the message body ends and when the next commit
 starts.

 JSON output would have been very handy.
>>>
>>> I am somewhat puzzled.  I thought that you were trying to come up
>>> with a way to produce JSON output and people are trying to help you
>>> by pointing out tools that you can use for that.

Re: Feature request: --format=json

2017-04-18 Thread Samuel Lijin

If for some reason your use case is so performance intensive that you
can't just `git cat-file commit` every entry in `git rev-list --all`
individually, then you can also pipe input into `git cat-file --batch`
and read output as you pipe input in, which will give you a very
simple mechanism for delimiting the cat-file output.

In any case, as developers, it's rare to have our job done for us.
That's why we write code.

I'm sure people would be happy to help if you submitted patches to
support --format=json.

On Tue, Apr 18, 2017 at 3:44 AM, Fred .Flintstone  wrote:
> Well the easiest way to work with that would be JSON.
> So the best would be if Git could output the data I want in JSON format.
> Then it would be easy for me to work with data.
>
> With git rev-list and git-cat file, its not so easy to reliably parse
> that output.
>
> On Tue, Apr 18, 2017 at 2:38 AM, Junio C Hamano  wrote:
>> "Fred .Flintstone"  writes:
>>
>>> So I would either have to do:
>>> git rev-list --all
>>> Then iterate over each line and do git-cat-file commit .
>>>
>>> Or do:
>>> git rev-list --all | git cat-file --batch
>>>
>>> If I do it in a batch, then it will be tricky to reliably parse since
>>> I don't know when the message body ends and when the next commit
>>> starts.
>>>
>>> JSON output would have been very handy.
>>
>> I am somewhat puzzled.  I thought that you were trying to come up
>> with a way to produce JSON output and people are trying to help you
>> by pointing out tools that you can use for that.

[PATCH v4 3/3] rebase: pass --[no-]signoff option to git am

2017-04-18 Thread Giuseppe Bilotta

This makes it easy to sign off a whole patchset before submission.

Signed-off-by: Giuseppe Bilotta 
---
 Documentation/git-rebase.txt |  5 +
 git-rebase.sh|  3 ++-
 t/t3428-rebase-signoff.sh| 46 
 3 files changed, 53 insertions(+), 1 deletion(-)
 create mode 100755 t/t3428-rebase-signoff.sh

diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
index 67d48e6883..53f4e1 100644
--- a/Documentation/git-rebase.txt
+++ b/Documentation/git-rebase.txt
@@ -370,6 +370,11 @@ default is `--no-fork-point`, otherwise the default is 
`--fork-point`.
of the rebased commits (see linkgit:git-am[1]).
Incompatible with the --interactive option.
 
+--signoff::
+   This flag is passed to 'git am' to sign off all the rebased
+   commits (see linkgit:git-am[1]). Incompatible with the
+   --interactive option.
+
 -i::
 --interactive::
Make a list of the commits which are about to be rebased.  Let the
diff --git a/git-rebase.sh b/git-rebase.sh
index 48d7c5ded4..db1deed846 100755
--- a/git-rebase.sh
+++ b/git-rebase.sh
@@ -34,6 +34,7 @@ root!  rebase all reachable commits up to the 
root(s)
 autosquash move commits that begin with squash!/fixup! under -i
 committer-date-is-author-date! passed to 'git am'
 ignore-date!   passed to 'git am'
+signoffpassed to 'git am'
 whitespace=!   passed to 'git apply'
 ignore-whitespace! passed to 'git apply'
 C=!passed to 'git apply'
@@ -321,7 +322,7 @@ do
--ignore-whitespace)
git_am_opt="$git_am_opt $1"
;;
-   --committer-date-is-author-date|--ignore-date)
+   --committer-date-is-author-date|--ignore-date|--signoff|--no-signoff)
git_am_opt="$git_am_opt $1"
force_rebase=t
;;
diff --git a/t/t3428-rebase-signoff.sh b/t/t3428-rebase-signoff.sh
new file mode 100755
index 00..2afb564701
--- /dev/null
+++ b/t/t3428-rebase-signoff.sh
@@ -0,0 +1,46 @@
+#!/bin/sh
+
+test_description='git rebase --signoff
+
+This test runs git rebase --signoff and make sure that it works.
+'
+
+. ./test-lib.sh
+
+# A simple file to commit
+cat >file /")
+EOF
+
+# Expected commit message after rebase without --signoff (or with --no-signoff)
+cat >expected-unsigned < actual &&
+   test_cmp expected-signed actual
+'
+
+test_expect_success 'rebase --no-signoff does not add a sign-off line' '
+   git commit --amend -m "first" &&
+   git rbs --no-signoff HEAD^ &&
+   git cat-file commit HEAD | sed -e "1,/^\$/d" > actual &&
+   test_cmp expected-unsigned actual
+'
+
+test_done
-- 
2.12.2.820.g78c033c3a1.dirty

[PATCH v4 2/3] builtin/am: fold am_signoff() into am_append_signoff()

2017-04-18 Thread Giuseppe Bilotta

There are no more direct calls to am_signoff(), so we can fold its
logic  in am_append_signoff().

(This is done in a separate commit rather than in the previous one, to
make it easier to revert this specific change if additional calls are
ever introduced.)

Signed-off-by: Giuseppe Bilotta 
---
 builtin/am.c | 33 +++--
 1 file changed, 15 insertions(+), 18 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index d072027b5a..b29f885e41 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -1181,42 +1181,39 @@ static void NORETURN die_user_resolve(const struct 
am_state *state)
exit(128);
 }
 
-static void am_signoff(struct strbuf *sb)
+/**
+ * Appends signoff to the "msg" field of the am_state.
+ */
+static void am_append_signoff(struct am_state *state)
 {
char *cp;
struct strbuf mine = STRBUF_INIT;
+   struct strbuf sb = STRBUF_INIT;
 
-   /* Does it end with our own sign-off? */
+   strbuf_attach(, state->msg, state->msg_len, state->msg_len);
+
+   /* our sign-off */
strbuf_addf(, "\n%s%s\n",
sign_off_header,
fmt_name(getenv("GIT_COMMITTER_NAME"),
 getenv("GIT_COMMITTER_EMAIL")));
-   if (mine.len < sb->len &&
-   !strcmp(mine.buf, sb->buf + sb->len - mine.len))
+
+   /* Does sb end with it already? */
+   if (mine.len < sb.len &&
+   !strcmp(mine.buf, sb.buf + sb.len - mine.len))
goto exit; /* no need to duplicate */
 
/* Does it have any Signed-off-by: in the text */
-   for (cp = sb->buf;
+   for (cp = sb.buf;
 cp && *cp && (cp = strstr(cp, sign_off_header)) != NULL;
 cp = strchr(cp, '\n')) {
-   if (sb->buf == cp || cp[-1] == '\n')
+   if (sb.buf == cp || cp[-1] == '\n')
break;
}
 
-   strbuf_addstr(sb, mine.buf + !!cp);
+   strbuf_addstr(, mine.buf + !!cp);
 exit:
strbuf_release();
-}
-
-/**
- * Appends signoff to the "msg" field of the am_state.
- */
-static void am_append_signoff(struct am_state *state)
-{
-   struct strbuf sb = STRBUF_INIT;
-
-   strbuf_attach(, state->msg, state->msg_len, state->msg_len);
-   am_signoff();
state->msg = strbuf_detach(, >msg_len);
 }
 
-- 
2.12.2.820.g78c033c3a1.dirty

[PATCH v4 0/3] rebase: --signoff support

2017-04-18 Thread Giuseppe Bilotta

Allow signing off a whole patchset by rebasing it with the `--signoff`
option, which is simply passed through to `git am`.

Changes since v3:

* --no-signoff is actually accepted;
* the paragraph documenting --signoff is now in the correct place and
  it mentions explicitly that the option is not compatible with
interactive mode.

The patchset is also available in the git repository at:

  git://git.oblomov.eu/git rebase-signoff

(Unrelated note: it just occurred to me while preparing this cover
letter that it would be nice if there was a way to combine `format-patch
--cover-letter` and `request-pull`.)

Some work about extending --signoff support to interactive rebases is
underway in the `rebase-signoff-ext` branch, but there's a lot of
corner cases to test and work-out, so I guess that'll be fore some
other time.

Giuseppe Bilotta (3):
  builtin/am: obey --signoff also when --rebasing
  builtin/am: fold am_signoff() into am_append_signoff()
  rebase: pass --[no-]signoff option to git am

 Documentation/git-rebase.txt |  5 +
 builtin/am.c | 39 +
 git-rebase.sh|  3 ++-
 t/t3428-rebase-signoff.sh| 46 
 4 files changed, 71 insertions(+), 22 deletions(-)
 create mode 100755 t/t3428-rebase-signoff.sh

-- 
2.12.2.820.g78c033c3a1.dirty

1 2 >

1 - 100 of 106 matches

Mail list logo