Re: [PATCH v5 4/8] Specify explicitly where we parse timestamps

2017-04-24 Thread Junio C Hamano
oJohannes Schindelin  writes:

> diff --git a/pretty.c b/pretty.c
> index d0f86f5d85c..24fb0c79062 100644
> --- a/pretty.c
> +++ b/pretty.c
> @@ -409,7 +409,7 @@ const char *show_ident_date(const struct ident_split 
> *ident,
>   long tz = 0;
>  
>   if (ident->date_begin && ident->date_end)
> - date = strtoul(ident->date_begin, NULL, 10);
> + date = parse_timestamp(ident->date_begin, NULL, 10);
>   if (date_overflows(date))
>   date = 0;
>   else {

Good.  We'd need to choose a different sentinel when we allow signed
timestamps, but that is outside the scope of this series.  

It would not hurt if we used a symbolic constant instead of 0 here
to save duplicated effort for future developers, though, and it is
in line with the spirit of this exact patch where it uses a more
specific function name to differentiate strtoul() used for times
from other uses of the same function.

In any case, I think another change to this section of the code made
later in another patch was where t4121 was broken in the previous
round, and it is good to see the breakage go.  As Peff was hinting,
we might want to revisit "should we substitute with a sentinel, or
make this a die?" decision, but discussing it is totally outside the
scope of this series.



Re: [PATCH 1/2] fast-export: deletion action first

2017-04-24 Thread Jeff King
On Mon, Apr 24, 2017 at 10:33:11PM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > Anyway. I don't think Miguel's patch needs to solve all of the lingering
> > rename cases. But I am curious whether it makes some rename cases worse,
> > because the depth-sorting was kicking in before and making them work.
> 
> I agree with you on both counts, and I care more about the second
> sentence, not just "am curious", but "am worried".  I am not sure
> that this patch is safe---it looked more like robbing peter to pay
> paul or the other way around.  Fixing for one class of breakage
> without regressing is one thing and it is perfectly fine to leave
> some already broken case broken with such a fix.  Claiming to fix
> one class and breaking other class that was happily working is quite
> different, and that is where my "Wait, we also allow renames?" comes
> from.

Yeah, I don't disagree. I am just curious first, then worried second. :)

If I had to choose, though, I'd rather see the order be reliable for the
no-renames case. IOW, if we must rob one peter, I'd rather it be the
renames, which already have tons of corner cases (and which I do not
think can be plugged for a reader which depends on the order of the
entries; the dependencies can be cycles).

Of course if we can make it work correctly in all of the non-cyclical
cases, all the better.

-Peff


Re: What's cooking in git.git (Apr 2017, #04; Wed, 19)

2017-04-24 Thread Johannes Sixt

Am 25.04.2017 um 04:00 schrieb Christian Couder:

On Sat, Apr 22, 2017 at 3:37 PM, Johannes Sixt  wrote:

Am 21.04.2017 um 14:29 schrieb Christian Couder:


First bisect should ask you to test merge bases only if there are
"good" commits that are not ancestors of the "bad" commit.



That's a tangent, but I have never understood why this needs to be so.
Consider this:

o--o--o--o--o--o--o--o--B
   /   /
 -o--o--o--o--g--o--o--o--o--G

When I mark B as bad and G as good, why would g have to be tested first?


It is because g could be bad if the bug has been fixed between g and G.
If this happens and we don't test g, we would give a wrong result.


Gah! So, a typical messy workflow "requires" this behavior. A clean 
branchy workflow like Git's does not because we know that a breakage is 
either on a topic branch, B, or a (the?) bad commit is an ancester of 
(the integration branch) G.





This is exactly what I do when I bisect in Git history: I mark the latest
commits on git-gui and gitk sub-histories as good, because I know they can't
possibly be bad. (In my setup, these two histories are ahead of pu and
next.)


Yeah, it is safe to do that in this case as we test the merge bases.


The idea of marking git-gui and gitk histories that none of their 
commits is checked out: it erases all Git source code from the working 
directory, and a later bisection step places all code back and it 
requires a full build. Not a big deal with Git, but there are much 
larger code bases.


The current bisect behavior makes this idea unworkable. For me, it was a 
big step backwards when it was implemented. :-(


-- Hannes



Re: [PATCH 1/2] fast-export: deletion action first

2017-04-24 Thread Junio C Hamano
Jeff King  writes:

> Anyway. I don't think Miguel's patch needs to solve all of the lingering
> rename cases. But I am curious whether it makes some rename cases worse,
> because the depth-sorting was kicking in before and making them work.

I agree with you on both counts, and I care more about the second
sentence, not just "am curious", but "am worried".  I am not sure
that this patch is safe---it looked more like robbing peter to pay
paul or the other way around.  Fixing for one class of breakage
without regressing is one thing and it is perfectly fine to leave
some already broken case broken with such a fix.  Claiming to fix
one class and breaking other class that was happily working is quite
different, and that is where my "Wait, we also allow renames?" comes
from.



Re: [PATCH v3 5/5] archive-zip: support files bigger than 4GB

2017-04-24 Thread René Scharfe

Am 25.04.2017 um 06:46 schrieb Junio C Hamano:

René Scharfe  writes:


diff --git a/t/t5004-archive-corner-cases.sh b/t/t5004-archive-corner-cases.sh
index 0ac94b5cc9..a6875dfdb1 100755
--- a/t/t5004-archive-corner-cases.sh
+++ b/t/t5004-archive-corner-cases.sh
@@ -178,7 +178,7 @@ test_expect_success EXPENSIVE,UNZIP 'zip archive bigger 
than 4GB' '
"$GIT_UNZIP" -t many-big.zip
  '
  
-test_expect_failure EXPENSIVE,UNZIP 'zip archive with files bigger than 4GB' '

+test_expect_success EXPENSIVE,UNZIP,ZIPINFO 'zip archive with files bigger 
than 4GB' '


This is a bit curious, as 1/5 adds this test that expects a failure
with three prerequisites already.  I'll assume that this is a rebase
glitch and the preimage actually must have ,ZIPINFO there already.


Yes, indeed -- I shouldn't try to do last minute edits.

René


Re: [PATCH v3 2/5] archive-zip: use strbuf for ZIP directory

2017-04-24 Thread René Scharfe

Am 25.04.2017 um 06:51 schrieb Junio C Hamano:

René Scharfe  writes:


Keep the ZIP central director, which is written after all archive


Is this typoed "directorY"?  I know there was discussion on the
correct terminology I only skimmed and I am too lazy to go back
there to understand it to answer this question myself, so...


Yep, a typo, this patch touches a "directory", as stated in the subject.

The "ZIP central director" would be PKWARE Inc., I guess? ;-)  The patch 
doesn't affect them..


René


Re: [PATCH v3 2/5] archive-zip: use strbuf for ZIP directory

2017-04-24 Thread Junio C Hamano
René Scharfe  writes:

> Keep the ZIP central director, which is written after all archive

Is this typoed "directorY"?  I know there was discussion on the
correct terminology I only skimmed and I am too lazy to go back
there to understand it to answer this question myself, so...

> -#define ZIP_DIRECTORY_MIN_SIZE   (1024 * 1024)

This tells me that there is this thing called ZIP directory, so
probably my guess above is correct ;-)



Re: [PATCH 1/2] fast-export: deletion action first

2017-04-24 Thread Jeff King
On Mon, Apr 24, 2017 at 09:24:59PM -0700, Junio C Hamano wrote:

> > So we sort deletions first. And the bit that the context doesn't quite
> > show here is that we then compare renames and push them to the end.
> > Everything else will compare equal.
> 
> Wait--we also allow renames?  Rename is like delete in the context
> of discussing d/f conflicts, in that it tells us that the source
> path will be missing in the end result.  If you rename a file "d" to
> "e", then there is a room for you to create a directory "d" to store
> a file "d/f" in.  Shouldn't it participate also in this "delete
> before add to avoid d/f conflict" logic?

Hrm. Yeah, I agree that case is problematic. But putting the renames
early creates the opposite problem. If you delete "d/f" to make way for
a rename to a file "d", then that deletion has to come first.

So naively you might think that pure deletions come first, then renames.
But I think you could have dependencies within the renames. For
instance:

  git init
  mkdir a b c
  seq 1 1000 >a/f
  seq 1001 2000 >b/f
  seq 2001 3000 >c/f
  git add .
  git commit -m base

  git mv a tmp
  git mv b/f a; rmdir b
  git mv c/f b; rmdir c
  git mv tmp/f c; rmdir tmp

There's no correct order there; it's a cycle.

So I suspect that any reader that accepts renames needs to be able to
handle the inputs in any order (I'd also suspect that many
implementations _don't_, but get by because people don't do silly things
like this in practice).

Anyway. I don't think Miguel's patch needs to solve all of the lingering
rename cases. But I am curious whether it makes some rename cases worse,
because the depth-sorting was kicking in before and making them work.

-Peff


Re: [PATCH v3 5/5] archive-zip: support files bigger than 4GB

2017-04-24 Thread Junio C Hamano
René Scharfe  writes:

> diff --git a/t/t5004-archive-corner-cases.sh b/t/t5004-archive-corner-cases.sh
> index 0ac94b5cc9..a6875dfdb1 100755
> --- a/t/t5004-archive-corner-cases.sh
> +++ b/t/t5004-archive-corner-cases.sh
> @@ -178,7 +178,7 @@ test_expect_success EXPENSIVE,UNZIP 'zip archive bigger 
> than 4GB' '
>   "$GIT_UNZIP" -t many-big.zip
>  '
>  
> -test_expect_failure EXPENSIVE,UNZIP 'zip archive with files bigger than 4GB' 
> '
> +test_expect_success EXPENSIVE,UNZIP,ZIPINFO 'zip archive with files bigger 
> than 4GB' '

This is a bit curious, as 1/5 adds this test that expects a failure
with three prerequisites already.  I'll assume that this is a rebase
glitch and the preimage actually must have ,ZIPINFO there already.

>   # Pack created with:
>   #   dd if=/dev/zero of=file bs=1M count=4100 && git hash-object -w file
>   mkdir -p .git/objects/pack &&


[PATCH v2] rebase -i: add config to abbreviate command-names

2017-04-24 Thread Liam Beguin
Add the 'rebase.abbrevCmd' boolean config option to allow `git rebase -i`
to abbreviate the command-names in the instruction list.

This means that `git rebase -i` would print:
p deadbee The oneline of this commit
...

instead of:
pick deadbee The oneline of this commit
...

Using a single character command-name allows the lines to remain
aligned, making the whole set more readable.

Signed-off-by: Liam Beguin 
---
Changes since v1:
 - Improve Documentation and commit message

 Documentation/config.txt | 19 +++
 Documentation/git-rebase.txt | 19 +++
 git-rebase--interactive.sh   |  8 ++--
 3 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 475e874d5155..8b1877f2df91 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2614,6 +2614,25 @@ rebase.instructionFormat::
the instruction list during an interactive rebase.  The format will 
automatically
have the long commit hash prepended to the format.
 
+rebase.abbrevCmd::
+   If set to true, `git rebase -i` will abbreviate the command-names in the
+   instruction list. This means that instead of looking like this,
+
+---
+   pick deadbee The oneline of this commit
+   pick fa1afe1 The oneline of the next commit
+   ...
+---
+
+   the list would use the short version of the command resulting in
+   something like this.
+
+---
+   p deadbee The oneline of this commit
+   p fa1afe1 The oneline of the next commit
+   ...
+---
+
 receive.advertiseAtomic::
By default, git-receive-pack will advertise the atomic push
capability to its clients. If you don't want to advertise this
diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
index 67d48e688315..7d97c0483241 100644
--- a/Documentation/git-rebase.txt
+++ b/Documentation/git-rebase.txt
@@ -222,6 +222,25 @@ rebase.missingCommitsCheck::
 rebase.instructionFormat::
Custom commit list format to use during an `--interactive` rebase.
 
+rebase.abbrevCmd::
+   If set to true, `git rebase -i` will abbreviate the command-names in the
+   instruction list. This means that instead of looking like this,
+
+---
+   pick deadbee The oneline of this commit
+   pick fa1afe1 The oneline of the next commit
+   ...
+---
+
+   the list would use the short version of the command resulting in
+   something like this.
+
+---
+   p deadbee The oneline of this commit
+   p fa1afe1 The oneline of the next commit
+   ...
+---
+
 OPTIONS
 ---
 --onto ::
diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
index 2c9c0165b5ab..9f3e82b79615 100644
--- a/git-rebase--interactive.sh
+++ b/git-rebase--interactive.sh
@@ -1210,6 +1210,10 @@ else
revisions=$onto...$orig_head
shortrevisions=$shorthead
 fi
+
+rebasecmd=pick
+test "$(git config --bool --get rebase.abbrevCmd)" = true && rebasecmd=p
+
 format=$(git config --get rebase.instructionFormat)
 # the 'rev-list .. | sed' requires %m to parse; the instruction requires %H to 
parse
 git rev-list $merges_option --format="%m%H ${format:-%s}" \
@@ -1228,7 +1232,7 @@ do
 
if test t != "$preserve_merges"
then
-   printf '%s\n' "${comment_out}pick $sha1 $rest" >>"$todo"
+   printf '%s\n' "${comment_out}${rebasecmd} $sha1 $rest" >>"$todo"
else
if test -z "$rebase_root"
then
@@ -1246,7 +1250,7 @@ do
if test f = "$preserve"
then
touch "$rewritten"/$sha1
-   printf '%s\n' "${comment_out}pick $sha1 $rest" >>"$todo"
+   printf '%s\n' "${comment_out}${rebasecmd} $sha1 $rest" 
>>"$todo"
fi
fi
 done
-- 
2.9.3



[PATCH v2] rebase -i: add config to abbreviate command-names

2017-04-24 Thread Liam Beguin
Add the 'rebase.abbrevCmd' boolean config option to allow `git rebase -i`
to abbreviate the command-names in the instruction list.

This means that `git rebase -i` would print:
p deadbee The oneline of this commit
...

instead of:
pick deadbee The oneline of this commit
...

Using a single character command-name allows the lines to remain
aligned, making the whole set more readable.

Signed-off-by: Liam Beguin 
---
Changes since v1:
 - Improve Documentation and commit message

 Documentation/config.txt | 19 +++
 Documentation/git-rebase.txt | 19 +++
 git-rebase--interactive.sh   |  8 ++--
 3 files changed, 44 insertions(+), 2 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 475e874d5155..8b1877f2df91 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2614,6 +2614,25 @@ rebase.instructionFormat::
the instruction list during an interactive rebase.  The format will 
automatically
have the long commit hash prepended to the format.
 
+rebase.abbrevCmd::
+   If set to true, `git rebase -i` will abbreviate the command-names in the
+   instruction list. This means that instead of looking like this,
+
+---
+   pick deadbee The oneline of this commit
+   pick fa1afe1 The oneline of the next commit
+   ...
+---
+
+   the list would use the short version of the command resulting in
+   something like this.
+
+---
+   p deadbee The oneline of this commit
+   p fa1afe1 The oneline of the next commit
+   ...
+---
+
 receive.advertiseAtomic::
By default, git-receive-pack will advertise the atomic push
capability to its clients. If you don't want to advertise this
diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
index 67d48e688315..7d97c0483241 100644
--- a/Documentation/git-rebase.txt
+++ b/Documentation/git-rebase.txt
@@ -222,6 +222,25 @@ rebase.missingCommitsCheck::
 rebase.instructionFormat::
Custom commit list format to use during an `--interactive` rebase.
 
+rebase.abbrevCmd::
+   If set to true, `git rebase -i` will abbreviate the command-names in the
+   instruction list. This means that instead of looking like this,
+
+---
+   pick deadbee The oneline of this commit
+   pick fa1afe1 The oneline of the next commit
+   ...
+---
+
+   the list would use the short version of the command resulting in
+   something like this.
+
+---
+   p deadbee The oneline of this commit
+   p fa1afe1 The oneline of the next commit
+   ...
+---
+
 OPTIONS
 ---
 --onto ::
diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
index 2c9c0165b5ab..9f3e82b79615 100644
--- a/git-rebase--interactive.sh
+++ b/git-rebase--interactive.sh
@@ -1210,6 +1210,10 @@ else
revisions=$onto...$orig_head
shortrevisions=$shorthead
 fi
+
+rebasecmd=pick
+test "$(git config --bool --get rebase.abbrevCmd)" = true && rebasecmd=p
+
 format=$(git config --get rebase.instructionFormat)
 # the 'rev-list .. | sed' requires %m to parse; the instruction requires %H to 
parse
 git rev-list $merges_option --format="%m%H ${format:-%s}" \
@@ -1228,7 +1232,7 @@ do
 
if test t != "$preserve_merges"
then
-   printf '%s\n' "${comment_out}pick $sha1 $rest" >>"$todo"
+   printf '%s\n' "${comment_out}${rebasecmd} $sha1 $rest" >>"$todo"
else
if test -z "$rebase_root"
then
@@ -1246,7 +1250,7 @@ do
if test f = "$preserve"
then
touch "$rewritten"/$sha1
-   printf '%s\n' "${comment_out}pick $sha1 $rest" >>"$todo"
+   printf '%s\n' "${comment_out}${rebasecmd} $sha1 $rest" 
>>"$todo"
fi
fi
 done
-- 
2.9.3



Re: [PATCH v5 0/6] Kill manual ref parsing code in worktree.c

2017-04-24 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> v5 chanes are mostly cosmetic:

Good changes.  Will re-queue.

Thanks.

>
> diff --git a/refs.c b/refs.c
> index 5d31fb6bcf..7972720256 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -1530,11 +1530,7 @@ struct ref_store *get_main_ref_store(void)
>   if (main_ref_store)
>   return main_ref_store;
>  
> - main_ref_store = ref_store_init(get_git_dir(),
> - (REF_STORE_READ |
> -  REF_STORE_WRITE |
> -  REF_STORE_ODB |
> -  REF_STORE_MAIN));
> + main_ref_store = ref_store_init(get_git_dir(), REF_STORE_ALL_CAPS);
>   return main_ref_store;
>  }
>  
> @@ -1597,9 +1593,6 @@ struct ref_store *get_submodule_ref_store(const char 
> *submodule)
>  struct ref_store *get_worktree_ref_store(const struct worktree *wt)
>  {
>   struct ref_store *refs;
> - unsigned int refs_all_capabilities =
> - REF_STORE_READ | REF_STORE_WRITE |
> - REF_STORE_ODB | REF_STORE_MAIN;
>   const char *id;
>  
>   if (wt->is_current)
> @@ -1612,10 +1605,10 @@ struct ref_store *get_worktree_ref_store(const struct 
> worktree *wt)
>  
>   if (wt->id)
>   refs = ref_store_init(git_common_path("worktrees/%s", wt->id),
> -   refs_all_capabilities);
> +   REF_STORE_ALL_CAPS);
>   else
>   refs = ref_store_init(get_git_common_dir(),
> -   refs_all_capabilities);
> +   REF_STORE_ALL_CAPS);
>  
>   if (refs)
>   register_ref_store_map(_ref_stores, "worktree",
> diff --git a/refs/refs-internal.h b/refs/refs-internal.h
> index 690498698e..b26f7e41ce 100644
> --- a/refs/refs-internal.h
> +++ b/refs/refs-internal.h
> @@ -467,6 +467,10 @@ struct ref_store;
>  #define REF_STORE_WRITE  (1 << 1) /* can perform update 
> operations */
>  #define REF_STORE_ODB(1 << 2) /* has access to object 
> database */
>  #define REF_STORE_MAIN   (1 << 3)
> +#define REF_STORE_ALL_CAPS   (REF_STORE_READ | \
> +  REF_STORE_WRITE | \
> +  REF_STORE_ODB | \
> +  REF_STORE_MAIN)
>  
>  /*
>   * Initialize the ref_store for the specified gitdir. These functions
>
> Nguyễn Thái Ngọc Duy (6):
>   environment.c: fix potential segfault by get_git_common_dir()
>   refs.c: make submodule ref store hashmap generic
>   refs: add REFS_STORE_ALL_CAPS
>   refs: introduce get_worktree_ref_store()
>   worktree.c: kill parse_ref() in favor of refs_resolve_ref_unsafe()
>   refs: kill set_worktree_head_symref()
>
>  branch.c   |  15 ++---
>  environment.c  |   2 +
>  refs.c | 100 
>  refs.h |  12 +---
>  refs/files-backend.c   |  44 --
>  refs/refs-internal.h   |   4 ++
>  t/helper/test-ref-store.c  |  18 ++
>  t/t1407-worktree-ref-store.sh (new +x) |  52 +
>  worktree.c | 102 
> +
>  worktree.h |   2 +-
>  10 files changed, 177 insertions(+), 174 deletions(-)
>  create mode 100755 t/t1407-worktree-ref-store.sh


Re: [PATCH 1/2] fast-export: deletion action first

2017-04-24 Thread Junio C Hamano
Jeff King  writes:

> Perhaps we would want a test for the case you are fixing (to be sure it
> is not re-broken), as well as confirming that we have not re-broken the
> original case (it looks like 060df62 added a test, so we may be OK with
> that).

Good suggestion.

>
>> +/*
>> + * Compares two diff types to order based on output priorities.
>> + */
>> +static int diff_type_cmp(const void *a_, const void *b_)
>> [...]
>> +/*
>> + * Move Delete entries first so that an addition is always reported 
>> after
>> + */
>> +cmp = (b->status == DIFF_STATUS_DELETED) - (a->status == 
>> DIFF_STATUS_DELETED);
>>  if (cmp)
>>  return cmp;
>>  /*
>
> So we sort deletions first. And the bit that the context doesn't quite
> show here is that we then compare renames and push them to the end.
> Everything else will compare equal.

Wait--we also allow renames?  Rename is like delete in the context
of discussing d/f conflicts, in that it tells us that the source
path will be missing in the end result.  If you rename a file "d" to
"e", then there is a room for you to create a directory "d" to store
a file "d/f" in.  Shouldn't it participate also in this "delete
before add to avoid d/f conflict" logic?

> Is qsort() guaranteed to be stable? If not, then we'll get the majority
> of entries in a non-deterministic order. Should we fallback to strcmp()
> so that within a given class, the entries are sorted by name?

Again, very good point, especially with the existing comment in the
comparison function that explains why renames are shown last.



Re: [PATCH] tests: fix tests broken under GETTEXT_POISON=YesPlease

2017-04-24 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> My general approach when writing & maintaining this poison has been
> that it's fine if we skip some tests, even though we could be bending
> over backwards to run them, or even if we don't know the root cause
> beyond "the rebase machinery is always broken with poison".
>
> This is because once I'm satisfied that the breaking test isn't
> because of some new plumbing message that got i18n'd I don't see the
> point of keeping digging, it's fine to just skip the test, because we
> run it when we're not under poison, and we're satisfied that it's not
> breaking because of a new plumbing message being i18n'd we've
> fulfilled the entire reason for why this poison facility exists in the
> first place.

As to skipping tests, I am worried mostly because it is very easy to
mark one test as skipped under poison build, even where the side
effect from that test left behind in the trash repository is a
prerequisite for a later test to succeed.  For example, a test that
creates a tag may be marked as skipped-under-poison.  Then a new
test that is added to such a test may want to do something using
that tag, and it will succeed in the usual test.  As most people do
not test poison build, when somebody notices that the new test fails
under poison build, it is unclear if the breakage is due to new i18n
issues or something else, like a missing prerequisite tag due to
skipping an earlier test.



Re: [PATCH v4 8/9] Use uintmax_t for timestamps

2017-04-24 Thread Junio C Hamano
Johannes Schindelin  writes:

>> Should we at least clamp in date_overflows() so that large values
>> representable with timestamp_t that will become unrepresentable when
>> we start allowing negative timestamps would be rejected?  That way
>> we won't have to hear complaints from the people who used timestamps
>> far in the future that we regressed the implementation for them by
>> halving the possible timestamp range.
>
> Please note that the date_overflows() command only tests when we are about
> to call system functions. I do not think that it does what you think it
> does (namely, validate timestamps when they enter Git).

OK, then please read my question without date_overflows(), that is,
"should we at least clamp the values with some new mechanism to
leave the door open for supporting times before the epoch, even if
(and especially if) we leave the use of signed integral type for
timestamps out of the scope of this series?"



Re: [PATCH v4 7/9] Abort if the system time cannot handle one of our timestamps

2017-04-24 Thread Junio C Hamano
Johannes Schindelin  writes:

> The code would also be incorrect, as the `minutes` variable can be
> negative, which the `unsigned_add_overflows()` macro cannot handle (it
> would report very, very false positives, and it would hurt you more than
> me because I live East of Greenwich).

Yes and no.  If we were to care about integer wraparound to do this
check, we'd still have to worry about negative offset applied to a
time very close to the epoch that turns into a timestamp far far in
the future (unsigned_sub_underflows, that is).



Re: Submodule/contents conflict

2017-04-24 Thread Jeff King
On Mon, Apr 24, 2017 at 11:22:42PM -0400, Jeff King wrote:

> > > It also has a similarity to
> > > https://public-inbox.org/git/1492287435.14812.2.ca...@gmail.com/  
> > > regarding
> > > how checkout operates.
> 
> I didn't look too deeply into this one, but it really looks like
> somebody caring too much about when git needs to write (and I'd suspect
> it's impacted by the racy-git thing, too).

Oh, sorry, I got this confused with:

  
https://public-inbox.org/git/CAM1zWBtfgHT=pT0pidQo1HD=dfrxlg3gnauvs0vzkvyfg1b...@mail.gmail.com/

which was mentioned earlier (and which I do think is just about
timestamps).

I don't think any of what I said is related to the doc update in

  https://public-inbox.org/git/1492287435.14812.2.ca...@gmail.com/

which seems reasonable to me.

-Peff


Re: [PATCH 1/2] fast-export: deletion action first

2017-04-24 Thread Jeff King
On Tue, Apr 25, 2017 at 02:12:16AM +0200, Miguel Torroja wrote:

> The delete operations of the fast-export output should precede any addition
> belonging to the same commit, Addition and deletion with the same name
> entry could happen in case of file to directory and viceversa.
> 
> The fast-export sorting was added in 060df62 (fast-export: Fix output
> order of D/F changes). That change was made in order to fix the case of
> directory to file in the same commit, but it broke the reverse case
> (File to directory).

That explanation makes sense.

>  builtin/fast-export.c | 25 +
>  1 file changed, 9 insertions(+), 16 deletions(-)

Perhaps we would want a test for the case you are fixing (to be sure it
is not re-broken), as well as confirming that we have not re-broken the
original case (it looks like 060df62 added a test, so we may be OK with
that).

> +/*
> + * Compares two diff types to order based on output priorities.
> + */
> +static int diff_type_cmp(const void *a_, const void *b_)
> [...]
> + /*
> +  * Move Delete entries first so that an addition is always reported 
> after
> +  */
> + cmp = (b->status == DIFF_STATUS_DELETED) - (a->status == 
> DIFF_STATUS_DELETED);
>   if (cmp)
>   return cmp;
>   /*

So we sort deletions first. And the bit that the context doesn't quite
show here is that we then compare renames and push them to the end.
Everything else will compare equal.

Is qsort() guaranteed to be stable? If not, then we'll get the majority
of entries in a non-deterministic order. Should we fallback to strcmp()
so that within a given class, the entries are sorted by name?

-Peff


Re: Submodule/contents conflict

2017-04-24 Thread Jeff King
On Mon, Apr 24, 2017 at 04:43:28PM -0700, Stefan Beller wrote:

> >> On the main list thare is a similar "issue" [1] regarding the expectation 
> >> for `git checkout`,
> >> and importantly (for me) these collected views regarding the "Git Data 
> >> Protection and
> >> Management Principles" is not within the Git documentation.
> >
> > Yes, that's an interesting point. What concerns me is that the commit
> > c5326bd62b7e168ba1339dacb7ee812d0fe98c7c which introduced this
> > into checkout isn't consistent with reset. Seems that nobody noticed this 
> > before.
> 
> It seems as if we'd want to see the code from
> c5326bd62b7e168ba1339dacb7ee812d0fe98c7c
> to be part of any worktree touching command, specifically reset?

Note that that commit is just about "git checkout  -- ".
The matching reset command, "git reset  -- " does handle
this the same way. Or at least I claimed so here:

  http://public-inbox.org/git/20141107081324.ga19...@peff.net/

and that is what I modeled the code in c5326bd62b after.

But note that none of that should ever affect _what_ gets checked out at
a file or content level. It may only affect the timestamps on the
resulting files. And I think those timestamps are not something Git
makes any promises about.

So if Git can elide a write and keep a timestamp up to date, that is a
great optimization and we should do that. But from an outside viewer's
perspective, we guarantee nothing. We might choose to rewrite a
stat-dirty file (updating its timestamp) rather than read its contents
to see if it might have the same content that we're about to write. And
the file may look stat dirty only because of the racy-git file/index
timestamp problem, even if it wasn't actually modified.

> > It also has a similarity to
> > https://public-inbox.org/git/1492287435.14812.2.ca...@gmail.com/  regarding
> > how checkout operates.

I didn't look too deeply into this one, but it really looks like
somebody caring too much about when git needs to write (and I'd suspect
it's impacted by the racy-git thing, too).

-Peff


Re: [PATCH] rebase -i: add config to abbreviate command name

2017-04-24 Thread liam BEGUIN
Hi Johannes,

On Mon, 2017-04-24 at 12:26 +0200, Johannes Schindelin wrote:
> Hi Liam,
> 
> On Sun, 23 Apr 2017, Liam Beguin wrote:
> 
> > Add the 'rebase.abbrevCmd' boolean config option to allow
> > the user to abbreviate the default command name while editing
> > the 'git-rebase-todo' file.
> 
> This patch does not handle the `git rebase --edit-todo` subcommand.
> Intentional?

After a little more investigation, I'm not sure what should be added for
the `git rebase --edit-todo` subcommand. It seems like it uses the same 
text that was added the first time (with `git rebase -i`). 
Do you have a bit more information about what you meant? 
I don't use this subcommand very often, I'm most likely missing something.

> 
> Ciao,
> Johannes

Thanks, 
Liam 


Re: [PATCH v6 12/11] run-command: don't try to execute directories

2017-04-24 Thread Jonathan Nieder
(cc-ing the reporter)
Brandon Williams wrote:

> In some situations run-command will incorrectly try (and fail) to
> execute a directory instead of an executable.  For example:
>
> Lets suppose a user has PATH=~/bin (where 'bin' is a directory) and they
> happen to have another directory inside 'bin' named 'git-remote-blah'.
> Then git tries to execute the directory:
>
>   $ git ls-remote blah://blah
>   fatal: cannot exec 'git-remote-blah': Permission denied
>
> This is due to only checking 'access()' when locating an executable in
> PATH, which doesn't distinguish between files and directories.  Instead
> use 'stat()' and check that the path is to a regular file.  Now
> run-command won't try to execute the directory 'git-remote-blah':
>
>   $ git ls-remote blah://blah
>   fatal: Unable to find remote helper for 'blah'

Reported-by: Brian Hatfield 

This was observed by having a directory called "ssh" in $PATH before
the real ssh and trying to use ssh protoccol.  Thanks for catching it.

> Signed-off-by: Brandon Williams 
> ---
>  run-command.c  | 3 ++-
>  t/t0061-run-command.sh | 7 +++
>  2 files changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/run-command.c b/run-command.c
> index a97d7bf9f..ece0bf342 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -127,6 +127,7 @@ static char *locate_in_PATH(const char *file)
>  
>   while (1) {
>   const char *end = strchrnul(p, ':');
> + struct stat st;
>  
>   strbuf_reset();
>  
> @@ -137,7 +138,7 @@ static char *locate_in_PATH(const char *file)
>   }
>   strbuf_addstr(, file);
>  
> - if (!access(buf.buf, F_OK))
> + if (!stat(buf.buf, ) && S_ISREG(st.st_mode))
>   return strbuf_detach(, NULL);
>  
>   if (!*end)
> diff --git a/t/t0061-run-command.sh b/t/t0061-run-command.sh
> index 98c09dd98..30c4ad75f 100755
> --- a/t/t0061-run-command.sh
> +++ b/t/t0061-run-command.sh
> @@ -37,6 +37,13 @@ test_expect_success !MINGW 'run_command can run a script 
> without a #! line' '
>   test_cmp empty err
>  '
>  
> +test_expect_success 'run_command should not try to execute a directory' '
> + test_when_finished "rm -rf bin/blah" &&
> + mkdir -p bin/blah &&
> + PATH=bin:$PATH test_must_fail test-run-command run-command blah 2>err &&
> + test_i18ngrep "No such file or directory" err
> +'
> +
>  test_expect_success POSIXPERM 'run_command reports EACCES' '
>   cat hello-script >hello.sh &&
>   chmod -x hello.sh &&


Re: [PATCH v6 12/11] run-command: don't try to execute directories

2017-04-24 Thread Jeff King
On Mon, Apr 24, 2017 at 05:17:24PM -0700, Jonathan Nieder wrote:

> > This is due to only checking 'access()' when locating an executable in
> > PATH, which doesn't distinguish between files and directories.  Instead
> > use 'stat()' and check that the path is to a regular file.  Now
> > run-command won't try to execute the directory 'git-remote-blah':
> >
> > $ git ls-remote blah://blah
> > fatal: Unable to find remote helper for 'blah'
> >
> > Signed-off-by: Brandon Williams 
> 
> For the interested, the context in which this was reported was trying
> to execute a directory named 'ssh'.  Thanks for a quick fix.
> 
> Technically this bug has existed since
> 
>   commit 38f865c27d1f2560afb48efd2b7b105c1278c4b5
>   Author: Jeff King 
>   Date:   Fri Mar 30 03:52:18 2012 -0400
> 
>  run-command: treat inaccessible directories as ENOENT
> 
> Until we switched from using execvp to execve, the symptom was very
> subtle: it only affected the error message when a program could not be
> found, instead of affecting functionality more substantially.

Yeah, I'm pretty sure I didn't think at all about access() matching
directories when doing that commit. Using stat() does seem like the
right solution.

-Peff


Re: [PATCH v6 12/11] run-command: don't try to execute directories

2017-04-24 Thread Jonathan Nieder
Junio C Hamano wrote:
> Jonathan Nieder  writes:

>> Until we switched from using execvp to execve, the symptom was very
>> subtle: it only affected the error message when a program could not be
>> found, instead of affecting functionality more substantially.
>
> Hmph, what if you had bin/ssh/ directory and bin2/ssh executable and
> had bin:bin2 listed in this order in your $PATH?  Without this change
> you'll get an error and that's the end of it.  With this change,
> you'd be able to execute bin2/ssh executable, no?  So I am not sure
> if I agree with the "this is just an error message subtlety".

I think you misunderstood what I meant.  execvp() does not have this
bug.  In current master, run_command() (within function sane_execvp())
double-checks execvp()'s work when it sees EACCES to decide whether
to convert it into a more user-friendly ENOENT.  Because of this bug,
if you have a bin/ssh/ directory and no bin2/ssh executable, instead
of reporting this condition as a user-friendly ENOENT, it would leave
it as EACCES.

> What does execvp() do when bin/ssh/ directory, bin2/ssh
> non-executable regular file, and bin3/ssh executable file exist and
> you have bin:bin2:bin3 on your $PATH?  That is what locate_in_PATH()
> should emulate, I would think.

Good catch.

 $ mkdir -p $HOME/bin1/greet
 $ mkdir $HOME/bin2
 $ printf '%s\n' 'echo bin2' >$HOME/bin2/greet
 $ mkdir $HOME/bin3
 $ printf '%s\n' '#!/bin/sh' 'echo bin3' >$HOME/bin3/greet
 $ chmod +x $HOME/bin3/greet
 $ PATH=$HOME/bin1:$HOME/bin2:$HOME/bin3:$PATH perl -e 'exec("greet")'
 bin3

It needs to skip over non-executable files.

I think this means we'd want to reuse something like is_executable
from help.c.

[...]
>>> +   if (!stat(buf.buf, ) && S_ISREG(st.st_mode))
>>> return strbuf_detach(, NULL);
>>
>> Should this share code with help.c's is_executable()?
>>
>> I suppose not, since that would have trouble finding scripts without
>> the executable bit set.

I confused myself about the script special-case: they are supposed to
have the executable bit set, too --- the special-casing is just about
lacking #!/bin/sh at the top (and hence not being directly executable
with execve).

>> I was momentarily nervous about what happens if this gets run on
>> Windows. This is just looking for a file's existence, not
>> executability, so it should be fine.
>
> When we are looking for "ssh" with locate_in_PATH(), shouldn't we
> look for "ssh.exe" on Windows, though?

Fortunately this is in a #if !defined(GIT_WINDOWS_NATIVE) block.  It's
probably worth adding a comment so people know not to rely on it
matching Windows path search behavior.

Thanks for looking it over,
Jonathan


Re: [PATCH v4 7/9] Abort if the system time cannot handle one of our timestamps

2017-04-24 Thread Junio C Hamano
Johannes Schindelin  writes:

> Hi Junio,
>
> On Sun, 23 Apr 2017, Junio C Hamano wrote:
>
>> Johannes Schindelin  writes:
>> 
>> > diff --git a/date.c b/date.c
>> > index 92ab31aa441..75f6335cd09 100644
>> > --- a/date.c
>> > +++ b/date.c
>> > @@ -46,7 +46,10 @@ static time_t gm_time_t(timestamp_t time, int tz)
>> >minutes = tz < 0 ? -tz : tz;
>> >minutes = (minutes / 100)*60 + (minutes % 100);
>> >minutes = tz < 0 ? -minutes : minutes;
>> > -  return time + minutes * 60;
>> > +
>> > +  if (date_overflows(time + minutes * 60))
>> > +  die("Timestamp too large for this system: %"PRItime, time);
>> > +  return (time_t)time + minutes * 60;
>> >  }
>> 
>> All the other calls to date_overflows() take a variable that holds
>> timestamp_t and presumably they are checking for integer wraparound
>> when the values are computed, but this one is not.
>
> I was debating whether this extra check is necessary and had decided
> against it. Apparently I was wrong to do so.

Not so fast.  Apparently you had thought about it and I didn't think
about it as long as you did, hence I asked ...

>
>> Perhaps we want to make it a bit more careful here?  I wonder if
>> something like this is a good approach:

as a question, not a request to change things, and not a "do not
come back until you rewrite it this way" ;-).  Getting a question is
not a sign that you were wrong at all.

If you thought it was not needed, and if you can clearly explain to
future readers of "git log" why it is the case, then I am prefectly
fine without an extra check.

In a response to another message, you seem to indicate that existing
code does not do any range checking to make sure ulong (the current
type the existing code uses) does not wrap around and you kept that
"property" of the code in this series (i.e.  date_overflows() range
check was merely for downcasting timestamp_t to time_t before giving
the result to the system functions).  If that is the case, then NOT
checking for integer wraparound in "+ minutes*60" computation IS the
consistent thing to do within your series, I would think.


Re: [PATCH] test-lib: abort when can't remove trash directory

2017-04-24 Thread Junio C Hamano
Torsten Bögershausen  writes:

> []
>>>
>>> -   cd "$(dirname "$remove_trash")" &&
>>> -   rm -rf "$(basename "$remove_trash")" ||
>>> -   error "Tests passed but test cleanup failed; aborting"
>>> +   cd "$(dirname "$TRASH_DIRECTORY")" &&
>>> +   rm -fr "$TRASH_DIRECTORY" ||
>>> +   error "Tests passed but test cleanup failed; aborting"
>>> +   fi
>>
>> Yeah, that looks good to me.
>
> Does it ?
> Checking the error code of "rm -f" doesn't work as expected from the script:
> rm -rf DoesNotExist ; echo $?
> 0
>
> I think it should be
>
>>> +   cd "$(dirname "$TRASH_DIRECTORY")" &&
>>> +   rm -r "$TRASH_DIRECTORY" ||
>>> +   error "Tests passed but test cleanup failed; aborting"

What Peff told you in his response is all correct, but besides, you
can see the patch and realize that the original has been using "rm
-rf" for ages.

This change is about not using $remove_trash variable, as it felt
brittle and error prone than detecting $debug case and removing
$TRASH_DIRECTORY.  So in that sense, I shouldn't have even rolled
the removal of basename into it.

Having said that, because people care about the number of subprocess
invocations, I am tempted to suggest doing

cd "$TRASH_DIRECTORY/.." &&
rm -fr "$TRASH_DIRECTORY" ||
error ...

which should reduce another process ;-)


Re: [PATCH v6 1/8] pkt-line: add packet_read_line_gently()

2017-04-24 Thread Junio C Hamano
Ben Peart  writes:

> On 4/24/2017 12:21 AM, Junio C Hamano wrote:
>> Ben Peart  writes:
>>
>>> Add packet_read_line_gently() to enable reading a line without dying on
>>> EOF.
>>>
>>> Signed-off-by: Ben Peart 
>>> ---
>>>   pkt-line.c | 14 +-
>>>   pkt-line.h | 10 ++
>>>   2 files changed, 23 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/pkt-line.c b/pkt-line.c
>>> index d4b6bfe076..bfdb177b34 100644
>>> --- a/pkt-line.c
>>> +++ b/pkt-line.c
>>> @@ -315,7 +315,7 @@ static char *packet_read_line_generic(int fd,
>>>   PACKET_READ_CHOMP_NEWLINE);
>>> if (dst_len)
>>> *dst_len = len;
>>> -   return len ? packet_buffer : NULL;
>>> +   return (len > 0) ? packet_buffer : NULL;
>>>   }
>> The log does not seem to explain what this change is about.
>
> This change was made as the result of a request in feedback from the
> previous version of the patch series which pointed out that the call
> to packet_read can return -1 in some error cases and to keep the code
> more consistent with packet_read_line_gently below.
>
> If len < 0 then the old code would have incorrectly returned a pointer
> to a buffer with garbage while the new code correctly returns NULL
> (fixes potential bug)
> if len == 0 then the code will return NULL before and after this
> change (no change in behavior)
> if len > 0 then the code will return packet_buffer before and after
> this change (no change in behavior)

TL;DR "Yes this is a preliminary fix and I'll split out into a
separate patch early in the series in the next reroll"?  Thanks.

>> Is this supposed to be a preliminary bugfix where this helper used
>> to return a non-NULL buffer when underlying packet_read() signaled
>> an error by returning a negative len?  If so, this should probably
>> be a separate patch early in the series.
>>
>> Should len==0 be considered an error?  Especially given that
>> PACKET_READ_CHOMP_NEWLINE is in use, I would expect that len==0
>> should be treated similarly to positive length, i.e. the otherside
>> gave us an empty line.
>>
>>> +{
>>> +   int len = packet_read(fd, NULL, NULL,
>>> + packet_buffer, sizeof(packet_buffer),
>>> + 
>>> PACKET_READ_CHOMP_NEWLINE|PACKET_READ_GENTLE_ON_EOF);
>>> +   if (dst_len)
>>> +   *dst_len = len;
>>> +   if (dst_line)
>>> +   *dst_line = (len > 0) ? packet_buffer : NULL;
>> I have the same doubt as above for len == 0 case.
>
> packet_read() returns -1 when PACKET_READ_GENTLE_ON_EOF is passed and
> it hits truncated output from the remote process. 

I know, but that is irrelevant to my question, which is about
CHOMP_NEWLINE.  I didn't even ask "why a negative len treated
specially?"  My question is about the case where len == 0.  Your
patch treats len==0 just like len==-1, i.e. an error, but I do not
know if that is correct, hence my question.  We both know len < 0
is an error and you do not need to waste time elaborating on it.




Re: What's cooking in git.git (Apr 2017, #04; Wed, 19)

2017-04-24 Thread Christian Couder
On Mon, Apr 24, 2017 at 6:34 PM, Philip Oakley  wrote:
>
> Sorry if I'm bikeshedding here, but it does look like adding an alternate
> 'bisect' strategy may be useful for this style of integration testing.
>
> To me, given the multiplicity of independent branches being brought
> together, it should be possible to do a check on each of the branches
> separately, before the tests along the line of integration . The tests would
> not be a true 'bisection' as they are separate single point tests, but they
> do establish good commits at the tips of those branches.
>
> Thus, for each of the merges in the --first-parent traversal, the option
> could test (in the OS of choice) the *second parent* commit of the merge.
> This sets the known good points. The breakages during the integration then
> become easier to bisect, as it is only looking for the integration of a good
> series into mainline that breaks. [1]

I think what you suggest is very similar in practice as the
--first-parent strategy that we have been suggesting for some years to
add to bisect as a GSoC project:

https://git.github.io/SoC-2017-Ideas/

> In addition, an aside that dscho made earlier about the merge-base of some
> branches relative to Windows may have been missed. The normal bisect process
> assumes that we start from a set of good merge bases.

When a good commit is not an ancestor of the bad commit, we test the
merge bases first and we abort if one of them is not good.

> However, dscho noticed
> that occasionally some series may choose an older point on maint (etc.) that
> itself would _not_ be classed as good when tested on Windows (or on other
> OS's). Those older mergebases can make the classic bisect across all the
> commits in the DAG between here and there a tortuous process, especially if
> the local OS implementation percieves the base points as bad! (which breaks
> expectations).

My understanding of the bisecting problem that Dscho is reporting is
that the process is sometimes buggy or tortuous because bisect is not
smart enough to optimize the testing of the merge bases when there are
a lot of them.

It is not because bisect expects or assumes that some merge bases are
good or bad when in fact they are not.


Re: Submodule/contents conflict

2017-04-24 Thread Junio C Hamano
Stefan Beller  writes:

> A similar bug report
> https://public-inbox.org/git/CAG0BQX=wvpkJ=pqwv-nbmhupv8yzvd_kykzjmsfwq9xstz2...@mail.gmail.com/
>
> "checkout --recurse-submodules" (as mentioned in that report)
> made it into Git by now, but this bug goes unfixed, still.

I do not mind reverting that merge out of 'master'.  Please holler
if you feel these "go recursive" topics are premature.

Thanks.


Re: [PATCH v7] read-cache: force_verify_index_checksum

2017-04-24 Thread Junio C Hamano
Jeff Hostetler  writes:

>>> +test_expect_success 'detect corrupt index file in fsck' '
>>> +cp .git/index .git/index.backup &&
>>> +test_when_finished "mv .git/index.backup .git/index" &&
>>> +echo  > &&
>>> +git add  &&
>>> +sed -e "s///" .git/index >.git/index.yyy &&
>>
>> sed on a binary file? Sooner or later we are going to run into
>> portability issues.
>
> In v5 of this patch series I used "perl" and it was suggested that I
> use "sed" instead.
> It doesn't matter to me which we use.  My testing showed that it was
> safe, but that
> was only Linux.
>
> Does the mailing list have a preference for this ?

Instead of munging pathnames z* to y*, I'd prefer to see the actual
checksum bytes at the end replaced in the index file.  After all
that is what this test really cares about, and it ensures that the
failure detected is due to checksum mismatch.

>>> +mv .git/index.yyy .git/index &&
>>> +# Confirm that fsck detects invalid checksum
>>> +test_must_fail git fsck --cache &&
>>
>> You should ensure that this failure is really because of an invalid
>> checksum. The failure could also be due to an extra LF at the end
>> that sed inserted, no?
>
> I suppose we could, but I'm tempted to wait on that for now.
>
> Jeff


Re: What's cooking in git.git (Apr 2017, #04; Wed, 19)

2017-04-24 Thread Christian Couder
On Sat, Apr 22, 2017 at 3:37 PM, Johannes Sixt  wrote:
> Am 21.04.2017 um 14:29 schrieb Christian Couder:
>>
>> First bisect should ask you to test merge bases only if there are
>> "good" commits that are not ancestors of the "bad" commit.
>
>
> That's a tangent, but I have never understood why this needs to be so.
> Consider this:
>
> o--o--o--o--o--o--o--o--B
>/   /
>  -o--o--o--o--g--o--o--o--o--G
>
> When I mark B as bad and G as good, why would g have to be tested first?

It is because g could be bad if the bug has been fixed between g and G.
If this happens and we don't test g, we would give a wrong result.

> This is exactly what I do when I bisect in Git history: I mark the latest
> commits on git-gui and gitk sub-histories as good, because I know they can't
> possibly be bad. (In my setup, these two histories are ahead of pu and
> next.)

Yeah, it is safe to do that in this case as we test the merge bases.


Re: [PATCH v6 12/11] run-command: don't try to execute directories

2017-04-24 Thread Junio C Hamano
Jonathan Nieder  writes:

> Until we switched from using execvp to execve, the symptom was very
> subtle: it only affected the error message when a program could not be
> found, instead of affecting functionality more substantially.

Hmph, what if you had bin/ssh/ directory and bin2/ssh executable and
had bin:bin2 listed in this order in your $PATH?  Without this change
you'll get an error and that's the end of it.  With this change,
you'd be able to execute bin2/ssh executable, no?  So I am not sure
if I agree with the "this is just an error message subtlety".

What does execvp() do when bin/ssh/ directory, bin2/ssh
non-executable regular file, and bin3/ssh executable file exist and
you have bin:bin2:bin3 on your $PATH?  That is what locate_in_PATH()
should emulate, I would think.

>> +if (!stat(buf.buf, ) && S_ISREG(st.st_mode))
>>  return strbuf_detach(, NULL);
>
> Should this share code with help.c's is_executable()?
>
> I suppose not, since that would have trouble finding scripts without
> the executable bit set.
>
> I was momentarily nervous about what happens if this gets run on
> Windows. This is just looking for a file's existence, not
> executability, so it should be fine.

When we are looking for "ssh" with locate_in_PATH(), shouldn't we
look for "ssh.exe" on Windows, though?


Re: [PATCH v6 12/11] run-command: don't try to execute directories

2017-04-24 Thread Junio C Hamano
Brandon Williams  writes:

> This is due to only checking 'access()' when locating an executable in
> PATH, which doesn't distinguish between files and directories.  Instead
> use 'stat()' and check that the path is to a regular file.  Now
> run-command won't try to execute the directory 'git-remote-blah':
>
>   $ git ls-remote blah://blah
>   fatal: Unable to find remote helper for 'blah'

The above is not a very interesting example.  

More important is that $PATH may have a directory with
git-remote-blah directory (your setup above) and then another
directory with the git-remote-blah executable that the user wanted
to use.  Without this change, we won't get to the real one, and that
makes this change truly valuable.

The added test demostrates the "uninteresting" behaviour.  Even
though it is correct and technically sufficient, it would make it
more relevant to do something like this:

mkdir -p bin/blah bin2 &&
write_script bin2/blah <<-\EOF &&
echo We found blah in bin2
EOF
PATH=bin:$PATH test_must_fail ... what you have
...
PATH=bin:bin2:$PATH test-run-command run-command blah >actual &&
bin2/blah >expect &&
test_cmp expect actual

as the point of locate_in_PATH() is to successfully find one,
without getting confused by an earlier unusable one.

Thanks.

> Signed-off-by: Brandon Williams 
> ---
>  run-command.c  | 3 ++-
>  t/t0061-run-command.sh | 7 +++
>  2 files changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/run-command.c b/run-command.c
> index a97d7bf9f..ece0bf342 100644
> --- a/run-command.c
> +++ b/run-command.c
> @@ -127,6 +127,7 @@ static char *locate_in_PATH(const char *file)
>  
>   while (1) {
>   const char *end = strchrnul(p, ':');
> + struct stat st;
>  
>   strbuf_reset();
>  
> @@ -137,7 +138,7 @@ static char *locate_in_PATH(const char *file)
>   }
>   strbuf_addstr(, file);
>  
> - if (!access(buf.buf, F_OK))
> + if (!stat(buf.buf, ) && S_ISREG(st.st_mode))
>   return strbuf_detach(, NULL);
>  
>   if (!*end)
> diff --git a/t/t0061-run-command.sh b/t/t0061-run-command.sh
> index 98c09dd98..30c4ad75f 100755
> --- a/t/t0061-run-command.sh
> +++ b/t/t0061-run-command.sh
> @@ -37,6 +37,13 @@ test_expect_success !MINGW 'run_command can run a script 
> without a #! line' '
>   test_cmp empty err
>  '
>  
> +test_expect_success 'run_command should not try to execute a directory' '
> + test_when_finished "rm -rf bin/blah" &&
> + mkdir -p bin/blah &&
> + PATH=bin:$PATH test_must_fail test-run-command run-command blah 2>err &&
> + test_i18ngrep "No such file or directory" err
> +'
> +
>  test_expect_success POSIXPERM 'run_command reports EACCES' '
>   cat hello-script >hello.sh &&
>   chmod -x hello.sh &&


Re: [PATCH] Documentation/git-checkout: make doc. of checkout clearer

2017-04-24 Thread Junio C Hamano
Christoph Michelbach  writes:

> I'm sorry, somehow my email client deleted half of your message when I saved 
> it
> when replying. I only noticed this when wanting to delete your email and 
> noticed
> that it was pretty long.

Please separate your discussion from the patch proper.  Check
Documentation/SubmittingPatches and send a proper patch like other
people do.


> From fe0d1298cf4de841af994f4d9f72d49e25edea00 Mon Sep 17 00:00:00 2001
> From: Christoph Michelbach 
> Date: Sat, 22 Apr 2017 18:49:57 +0200
> Subject: [PATCH] Doc./git-checkout: correct doc. of checkout ...

These we take from the e-mail header.  You usually remove them from
the body of the message (and move the Subject: to e-mail subject), hence

> The previous documentation states that the named paths are

this line will become the first line in the body of the message.

> A hint alerting the users that changes introduced by this
> command when naming a tree-ish are automatically staged has
> been introduced.

We are still saying automatically here?

> Signed-off-by: Christoph Michelbach 
> ---
>  Documentation/git-checkout.txt | 15 ---
>  1 file changed, 8 insertions(+), 7 deletions(-)

This is not limited to your message, but from time to time, I see
messages with SP substituted with non-breaking space like the above
two lines (and they appear in the patch text).  I can still read and
comment on the patch, but it is unusuable as an input to "git am" to
be applied.  I wonder where these NBSP come from---perhaps some
commmon MUAs corrupt text messages like this?

>
> diff --git a/Documentation/git-checkout.txt b/Documentation/git-checkout.txt
> index 8e2c066..ea3b4df 100644
> --- a/Documentation/git-checkout.txt
> +++ b/Documentation/git-checkout.txt
> @@ -81,13 +81,14 @@ Omitting  detaches HEAD at the tip of the current
> branch.
>  'git checkout' [-p|--patch] [] [--] ...::
>  
>   When  or `--patch` are given, 'git checkout' does *not*
> - switch branches.  It updates the named paths in the working tree
> - from the index file or from a named  (most often a
> - commit).  In this case, the `-b` and `--track` options are
> - meaningless and giving either of them results in an error.  The
> -  argument can be used to specify a specific tree-ish
> - (i.e.  commit, tag or tree) to update the index for the given
> - paths before updating the working tree.
> + switch branches.  It copies the files matching the pathspecs in

I am debating myself if rephrasing "in " to "from
" makes the result clearer to understand.  When we say
"Copy files from T to I and W", it would be obvious that what does
not exist in T will not be copied.

I also wonder "If no-tree-ish is provided" at the end of this
paragraph you have is a bit too far from the above primary point of
the description, because you have an unrelated "In this case,
-b/-t...", in between.  

The blobs matching the pathspecs are checked out from the
index to the working tree.  Optionally, when  is
given, the blobs matching the pathspecs from the 
is copied to the index before that happens.

is what I would want to say, but obviously that does not describe
what happens in the chronological order, so it is the most clear
description for people who understand what is written, but it will
take two reading until the reader gets to that stage X-<.

Perhaps the unrelated "In this case, the -b..." should go first; it
is part of "does *not* switch branches".  Also even with your patch,
it starts with "X is not Y" and does not clearly say "X is Z".

When  or `--patch` ar given, 'git checkout' does
*not* switch branches (giving the `-b` and `--track` options
will cause an error, as they are meaningless).  It checks
out paths out of the  (if given) and the index to
the to working tree.  When an optional  is given
blobs in the  that match  are copied to
the index.  The blobs that match  are then copied
from the index to the working tree, overwriting what is in
(or missing from) the working tree.

May be an improvement (i.e. say what Z is: checking out paths from
tree-ish and/or index to the working tree).  By explicitly phrasing
that , from which the index is updated, is optional, it is
clear that without  there is no update to the index.
"missing from" covers two cases: (1) the user removed the file from
the working tree and , e.g. HEAD, has the file, hence
removed one is resurrected; (2) the user didn't touch the file and
HEAD didn't have it, but by checking out from  that has
the file, the user added that new file to the set of files the user
is working with.

Hmm?



"Extra Opportunity"/2017/TL

2017-04-24 Thread Sunoco Oil
Do You Commute To Work Daily With Your Car,Do You Love Driving ? Sunoco Oil 
Will Remunerate You With $400 Weekly To Place An AD On Your Vehicle. Respond 
For More information...

Respectfully,
Antonio Conte
Sunoco Oil Inc®


[PATCH] submodule_init: die cleanly on submodules without url defined

2017-04-24 Thread Jeff King
When we init a submodule, we try to die when it has no URL
defined:

  url = xstrdup(sub->url);
  if (!url)
  die(...);

But that's clearly nonsense. xstrdup() will never return
NULL, and if sub->url is NULL, we'll segfault.

These two bits of code need to be flipped, so we check
sub->url before looking at it.

Signed-off-by: Jeff King 
---
The bug is from the original submodule--helper, so it first appeared in
v2.9.x.

 builtin/submodule--helper.c | 6 +++---
 t/t7400-submodule-basic.sh  | 8 
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 36e423182..566a5b6a6 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -376,12 +376,12 @@ static void init_submodule(const char *path, const char 
*prefix, int quiet)
strbuf_reset();
strbuf_addf(, "submodule.%s.url", sub->name);
if (git_config_get_string(sb.buf, )) {
-   url = xstrdup(sub->url);
-
-   if (!url)
+   if (!sub->url)
die(_("No url found for submodule path '%s' in 
.gitmodules"),
displaypath);
 
+   url = xstrdup(sub->url);
+
/* Possibly a url relative to parent */
if (starts_with_dot_dot_slash(url) ||
starts_with_dot_slash(url)) {
diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
index c2706fe47..1b8f1dbd3 100755
--- a/t/t7400-submodule-basic.sh
+++ b/t/t7400-submodule-basic.sh
@@ -38,6 +38,14 @@ test_expect_success 'submodule update aborts on missing 
.gitmodules file' '
test_i18ngrep "Submodule path .sub. not initialized" actual
 '
 
+test_expect_success 'submodule update aborts on missing gitmodules url' '
+   test_when_finished "git update-index --remove sub" &&
+   git update-index --add --cacheinfo 16,$(git rev-parse HEAD),sub &&
+   test_when_finished "rm -f .gitmodules" &&
+   git config -f .gitmodules submodule.s.path sub &&
+   test_must_fail git submodule init
+'
+
 test_expect_success 'configuration parsing' '
test_when_finished "rm -f .gitmodules" &&
cat >.gitmodules <<-\EOF &&
-- 
2.13.0.rc0.459.g06dc2b676


Re: What's cooking in git.git (Apr 2017, #04; Wed, 19)

2017-04-24 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> Is getting the results of these builds time-critical? If not perhaps
> an acceptable solution would be to use a source repo that's
> time-delayed, e.g. 24hrs behind on average from Junio's git.git, and
> where commits are pushed in at some configurable trickle.

Because the tip of 'pu' is updated every day, but the tips of
individual topics are not, the tip of the more stable side of
integration branches are less often updated, I would think that an
automated procedure (whether it is Travis or Dscho's Windows thing)
that is most efficient for us would:

 - treat updates to 'maint', 'master' and 'next' as they do right
   now;

 - use update to 'pu' as a mere "trigger" and not test 'pu' as an
   aggregate as it is not very interesting (merges near the tip may
   be merging known-broken topics, like yesterday's pushout).

And arrange 'pu' update to trigger a procedure like this:

 - Grab the list of merge commits beyond what matches 'next':

   $ git rev-list --first-parent '^pu^{/^### match next}' pu

 - For each of these merges:

   - If its second parent (i.e. the tip of a topic) was already
 tested with the current 'master', do not bother testing it
 again;

   - Otherwise, call its second parent "it" in the below:

 - Create a merge of it (and nothing else) on top of 'master',
   and test it.  Each topic is intended to be able to graduate
   without other topics in 'pu', so this will catch potential
   breakage early and help blocking a broken topic from getting
   merged to 'master'.

 - When the above fails, test it in isolation to see if it
   itself is broken.  If this succeeds, we may have discovered
   unintended interactions between the topic and other topics
   already in 'master'.  Or the topic itself may already been
   broken.

 - In either case, record the test result so that the same tip
   of the topic would not have to be tested again, merely
   because the tip of 'pu' got updated next day due to updates
   in other topics.

We can s/master/next/ in the above description (I am undecided on
pros-and-cons between using 'master' and 'next').


Re: I suggest a new feature: One copy from files

2017-04-24 Thread Jacob Keller
On Mon, Apr 24, 2017 at 4:47 PM, Rm Beer  wrote:
> 2017-04-24 3:13 GMT-03:00 Jacob Keller :
>> So, clearly you haven't defined the request very well. It *sounds*
>> like what you want is the ability to say "git please store and
>> copy/send this file to other people, but only store it once, and don't
>> allow storing history of it". This pretty much defeats the entire
>> point of revision control and doesn't make sense to me as part of a
>> revision control system.
>
> Not have sense the save history of revision control system for any
> binary files datas, who need save a multiple files by change any bytes
> from a files? Where i change any pixel of image, a word of odt/doc, or
> sound, music, driver, etc. In this case you only need 1 copy in the
> repository of .git , you not need a 100 copys in the .git, one by each
> day of change. You need a old image with a wrong pixel? not have
> sense...
>

Please don't drop the list :)

If you're not interested in tracking the revision history of a file,
it doesn't need to be stored inside a revision control system.
Instead, you probably want an alternative method for sharing such a
file. Git is primarily about tracking changes over time to a
collection of files in a project. It's also somewhat about sharing
this history to other people, but I wouldn't say that is its primary
goal.

In either case, you could instead use an alternative mechanism for
sharing the large binary file and have people grab the file this way.
However, it is incredibly valuable to share the history of a file so
that the other users can see what changed or make sure that the
version they are using works with the version of the other sources
they have.

In the case of binary files, you might want to use alternative diff
drivers to compare changes more easily, or instead provide some
non-binary data that is used to generate the binary files (such as
source code).

The whole point of revision history is to show that "hey this used to
have a wrong pixel, and now we fixed it, and here's how we did it".
Maybe the "wrong" pixel was actually correct and you find out later
that you need to revert this change. But the "later" is many months
and you no longer have the exact change so you're using memory or
external data to determine what to change again instead of the actual
history of a file.

Thanks,
Jake


Re: [PATCH v6 12/11] run-command: don't try to execute directories

2017-04-24 Thread Jonathan Nieder
Brandon Williams wrote:

> In some situations run-command will incorrectly try (and fail) to
> execute a directory instead of an executable.  For example:
>
> Lets suppose a user has PATH=~/bin (where 'bin' is a directory) and they
> happen to have another directory inside 'bin' named 'git-remote-blah'.
> Then git tries to execute the directory:
>
>   $ git ls-remote blah://blah
>   fatal: cannot exec 'git-remote-blah': Permission denied
>
> This is due to only checking 'access()' when locating an executable in
> PATH, which doesn't distinguish between files and directories.  Instead
> use 'stat()' and check that the path is to a regular file.  Now
> run-command won't try to execute the directory 'git-remote-blah':
>
>   $ git ls-remote blah://blah
>   fatal: Unable to find remote helper for 'blah'
>
> Signed-off-by: Brandon Williams 

For the interested, the context in which this was reported was trying
to execute a directory named 'ssh'.  Thanks for a quick fix.

Technically this bug has existed since

commit 38f865c27d1f2560afb48efd2b7b105c1278c4b5
Author: Jeff King 
Date:   Fri Mar 30 03:52:18 2012 -0400

   run-command: treat inaccessible directories as ENOENT

Until we switched from using execvp to execve, the symptom was very
subtle: it only affected the error message when a program could not be
found, instead of affecting functionality more substantially.

[...]
> --- a/run-command.c
> +++ b/run-command.c
> @@ -127,6 +127,7 @@ static char *locate_in_PATH(const char *file)
>  
>   while (1) {
>   const char *end = strchrnul(p, ':');
> + struct stat st;
>  
>   strbuf_reset();
>  
> @@ -137,7 +138,7 @@ static char *locate_in_PATH(const char *file)
>   }
>   strbuf_addstr(, file);
>  
> - if (!access(buf.buf, F_OK))
> + if (!stat(buf.buf, ) && S_ISREG(st.st_mode))
>   return strbuf_detach(, NULL);

Should this share code with help.c's is_executable()?

I suppose not, since that would have trouble finding scripts without
the executable bit set.

I was momentarily nervous about what happens if this gets run on
Windows. This is just looking for a file's existence, not
executability, so it should be fine.

> --- a/t/t0061-run-command.sh
> +++ b/t/t0061-run-command.sh
> @@ -37,6 +37,13 @@ test_expect_success !MINGW 'run_command can run a script 
> without a #! line' '
>   test_cmp empty err
>  '
>  
> +test_expect_success 'run_command should not try to execute a directory' '
> + test_when_finished "rm -rf bin/blah" &&
> + mkdir -p bin/blah &&
> + PATH=bin:$PATH test_must_fail test-run-command run-command blah 2>err &&

Two nits:

- this environment variable setting leaks past the test_must_fail
  invocation in some shells.  When running external comments, they
  update the environment after forking, but when running shell
  functions, they update the environment first and never set it back.

  A search with "git grep -e '=.* test_must_fail'" finds no other
  instances of this pattern, so apparently we've done a good job of
  being careful about that. *surprised*  t/check-non-portable-shell.pl
  doesn't check for this.  Perhaps it should.

  Standard workarounds:

(
PATH=... &&
export PATH &&
test_must_fail ...
)

  or

test_must_fail env PATH=... ...

- using a relative path (other than '.') in $PATH feels unusual.  We
  can mimic a typical user setup more closely by using "$PWD/bin". 

> + test_i18ngrep "No such file or directory" err

This string comes from libc.  Is there some other way to test for
what this patch does?

E.g. how about something like the following?

Thanks,
Jonathan

diff --git i/t/t0061-run-command.sh w/t/t0061-run-command.sh
index 30c4ad75ff..68cd0a8072 100755
--- i/t/t0061-run-command.sh
+++ w/t/t0061-run-command.sh
@@ -38,10 +38,16 @@ test_expect_success !MINGW 'run_command can run a script 
without a #! line' '
 '
 
 test_expect_success 'run_command should not try to execute a directory' '
-   test_when_finished "rm -rf bin/blah" &&
-   mkdir -p bin/blah &&
-   PATH=bin:$PATH test_must_fail test-run-command run-command blah 2>err &&
-   test_i18ngrep "No such file or directory" err
+   test_when_finished "rm -rf bin1 bin2" &&
+   mkdir -p bin1/blah &&
+   mkdir bin2 &&
+   cat hello-script >bin2/blah &&
+   chmod +x bin2/blah &&
+   PATH=$PWD/bin1:$PWD/bin2:$PATH \
+   test-run-command run-command blah >actual 2>err &&
+
+   test_cmp hello-script actual &&
+   test_cmp empty err
 '
 
 test_expect_success POSIXPERM 'run_command reports EACCES' '


[PATCH 2/2] fast-export: DIFF_STATUS_RENAMED instead of 'R'

2017-04-24 Thread Miguel Torroja
Minor change to be consistent with the rest of the fast-export code.
DIFF_STATUS_RENAMED is defined as 'R'.

Signed-off-by: Miguel Torroja 
---
 builtin/fast-export.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index a3ab7da..4d39324 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -280,7 +280,7 @@ static int diff_type_cmp(const void *a_, const void *b_)
 * appear in the output before it is renamed (e.g., when a file
 * was copied and renamed in the same commit).
 */
-   return (a->status == 'R') - (b->status == 'R');
+   return (a->status == DIFF_STATUS_RENAMED) - (b->status == 
DIFF_STATUS_RENAMED);
 }
 
 static void print_path_1(const char *path)
-- 
2.1.4



[PATCH 1/2] fast-export: deletion action first

2017-04-24 Thread Miguel Torroja
The delete operations of the fast-export output should precede any addition
belonging to the same commit, Addition and deletion with the same name
entry could happen in case of file to directory and viceversa.

The fast-export sorting was added in 060df62 (fast-export: Fix output
order of D/F changes). That change was made in order to fix the case of
directory to file in the same commit, but it broke the reverse case
(File to directory).

Signed-off-by: Miguel Torroja 
---
 builtin/fast-export.c | 25 +
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index e022063..a3ab7da 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -260,26 +260,19 @@ static void export_blob(const struct object_id *oid)
free(buf);
 }
 
-static int depth_first(const void *a_, const void *b_)
+/*
+ * Compares two diff types to order based on output priorities.
+ */
+static int diff_type_cmp(const void *a_, const void *b_)
 {
const struct diff_filepair *a = *((const struct diff_filepair **)a_);
const struct diff_filepair *b = *((const struct diff_filepair **)b_);
-   const char *name_a, *name_b;
-   int len_a, len_b, len;
int cmp;
 
-   name_a = a->one ? a->one->path : a->two->path;
-   name_b = b->one ? b->one->path : b->two->path;
-
-   len_a = strlen(name_a);
-   len_b = strlen(name_b);
-   len = (len_a < len_b) ? len_a : len_b;
-
-   /* strcmp will sort 'd' before 'd/e', we want 'd/e' before 'd' */
-   cmp = memcmp(name_a, name_b, len);
-   if (cmp)
-   return cmp;
-   cmp = len_b - len_a;
+   /*
+* Move Delete entries first so that an addition is always reported 
after
+*/
+   cmp = (b->status == DIFF_STATUS_DELETED) - (a->status == 
DIFF_STATUS_DELETED);
if (cmp)
return cmp;
/*
@@ -347,7 +340,7 @@ static void show_filemodify(struct diff_queue_struct *q,
 * Handle files below a directory first, in case they are all deleted
 * and the directory changes to a file or symlink.
 */
-   QSORT(q->queue, q->nr, depth_first);
+   QSORT(q->queue, q->nr, diff_type_cmp);
 
for (i = 0; i < q->nr; i++) {
struct diff_filespec *ospec = q->queue[i]->one;
-- 
2.1.4



[PATCH 2/2] fast-export: DIFF_STATUS_RENAMED instead of 'R'

2017-04-24 Thread Miguel Torroja
Minor change to be consistent with the rest of the fast-export code.
DIFF_STATUS_RENAMED is defined as 'R'.

Signed-off-by: Miguel Torroja 
---
 builtin/fast-export.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index a3ab7da..4d39324 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -280,7 +280,7 @@ static int diff_type_cmp(const void *a_, const void *b_)
 * appear in the output before it is renamed (e.g., when a file
 * was copied and renamed in the same commit).
 */
-   return (a->status == 'R') - (b->status == 'R');
+   return (a->status == DIFF_STATUS_RENAMED) - (b->status == 
DIFF_STATUS_RENAMED);
 }
 
 static void print_path_1(const char *path)
-- 
2.1.4



[PATCH 1/2] fast-export: deletion action first

2017-04-24 Thread Miguel Torroja
The delete operations of the fast-export output should precede any addition
belonging to the same commit, Addition and deletion with the same name
entry could happen in case of file to directory and viceversa.

The fast-export sorting was added in 060df62 (fast-export: Fix output
order of D/F changes). That change was made in order to fix the case of
directory to file in the same commit, but it broke the reverse case
(File to directory).

Signed-off-by: Miguel Torroja 
---
 builtin/fast-export.c | 25 +
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index e022063..a3ab7da 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -260,26 +260,19 @@ static void export_blob(const struct object_id *oid)
free(buf);
 }
 
-static int depth_first(const void *a_, const void *b_)
+/*
+ * Compares two diff types to order based on output priorities.
+ */
+static int diff_type_cmp(const void *a_, const void *b_)
 {
const struct diff_filepair *a = *((const struct diff_filepair **)a_);
const struct diff_filepair *b = *((const struct diff_filepair **)b_);
-   const char *name_a, *name_b;
-   int len_a, len_b, len;
int cmp;
 
-   name_a = a->one ? a->one->path : a->two->path;
-   name_b = b->one ? b->one->path : b->two->path;
-
-   len_a = strlen(name_a);
-   len_b = strlen(name_b);
-   len = (len_a < len_b) ? len_a : len_b;
-
-   /* strcmp will sort 'd' before 'd/e', we want 'd/e' before 'd' */
-   cmp = memcmp(name_a, name_b, len);
-   if (cmp)
-   return cmp;
-   cmp = len_b - len_a;
+   /*
+* Move Delete entries first so that an addition is always reported 
after
+*/
+   cmp = (b->status == DIFF_STATUS_DELETED) - (a->status == 
DIFF_STATUS_DELETED);
if (cmp)
return cmp;
/*
@@ -347,7 +340,7 @@ static void show_filemodify(struct diff_queue_struct *q,
 * Handle files below a directory first, in case they are all deleted
 * and the directory changes to a file or symlink.
 */
-   QSORT(q->queue, q->nr, depth_first);
+   QSORT(q->queue, q->nr, diff_type_cmp);
 
for (i = 0; i < q->nr; i++) {
struct diff_filespec *ospec = q->queue[i]->one;
-- 
2.1.4



[PATCH v6 12/11] run-command: don't try to execute directories

2017-04-24 Thread Brandon Williams
In some situations run-command will incorrectly try (and fail) to
execute a directory instead of an executable.  For example:

Lets suppose a user has PATH=~/bin (where 'bin' is a directory) and they
happen to have another directory inside 'bin' named 'git-remote-blah'.
Then git tries to execute the directory:

$ git ls-remote blah://blah
fatal: cannot exec 'git-remote-blah': Permission denied

This is due to only checking 'access()' when locating an executable in
PATH, which doesn't distinguish between files and directories.  Instead
use 'stat()' and check that the path is to a regular file.  Now
run-command won't try to execute the directory 'git-remote-blah':

$ git ls-remote blah://blah
fatal: Unable to find remote helper for 'blah'

Signed-off-by: Brandon Williams 
---
 run-command.c  | 3 ++-
 t/t0061-run-command.sh | 7 +++
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/run-command.c b/run-command.c
index a97d7bf9f..ece0bf342 100644
--- a/run-command.c
+++ b/run-command.c
@@ -127,6 +127,7 @@ static char *locate_in_PATH(const char *file)
 
while (1) {
const char *end = strchrnul(p, ':');
+   struct stat st;
 
strbuf_reset();
 
@@ -137,7 +138,7 @@ static char *locate_in_PATH(const char *file)
}
strbuf_addstr(, file);
 
-   if (!access(buf.buf, F_OK))
+   if (!stat(buf.buf, ) && S_ISREG(st.st_mode))
return strbuf_detach(, NULL);
 
if (!*end)
diff --git a/t/t0061-run-command.sh b/t/t0061-run-command.sh
index 98c09dd98..30c4ad75f 100755
--- a/t/t0061-run-command.sh
+++ b/t/t0061-run-command.sh
@@ -37,6 +37,13 @@ test_expect_success !MINGW 'run_command can run a script 
without a #! line' '
test_cmp empty err
 '
 
+test_expect_success 'run_command should not try to execute a directory' '
+   test_when_finished "rm -rf bin/blah" &&
+   mkdir -p bin/blah &&
+   PATH=bin:$PATH test_must_fail test-run-command run-command blah 2>err &&
+   test_i18ngrep "No such file or directory" err
+'
+
 test_expect_success POSIXPERM 'run_command reports EACCES' '
cat hello-script >hello.sh &&
chmod -x hello.sh &&
-- 
2.13.0.rc0.306.g87b477812d-goog



Re: Submodule/contents conflict

2017-04-24 Thread Stefan Beller
On Mon, Apr 24, 2017 at 4:33 PM, Philip Oakley  wrote:

> This is almost the same as just reported by 'vvs' [1]
> https://public-inbox.org/git/CAM1zWBtfgHT=pT0pidQo1HD=dfrxlg3gnauvs0vzkvyfg1b...@mail.gmail.com/,
> originally on the 'git user' list
> https://groups.google.com/forum/?hl=en#!topic/git-users/9ziZ6yq-BfU

For this issue, +cc Jeff King due to this pointer:

>> On the main list thare is a similar "issue" [1] regarding the expectation 
>> for `git checkout`,
>> and importantly (for me) these collected views regarding the "Git Data 
>> Protection and
>> Management Principles" is not within the Git documentation.
>
> Yes, that's an interesting point. What concerns me is that the commit
> c5326bd62b7e168ba1339dacb7ee812d0fe98c7c which introduced this
> into checkout isn't consistent with reset. Seems that nobody noticed this 
> before.

It seems as if we'd want to see the code from
c5326bd62b7e168ba1339dacb7ee812d0fe98c7c
to be part of any worktree touching command, specifically reset?

> It also has a similarity to
> https://public-inbox.org/git/1492287435.14812.2.ca...@gmail.com/  regarding
> how checkout operates.

This seems to be orthogonal to the original topic (no submodules, nor checkout
bugs, just a doc update?)


> It does feel as if there are two slightly different optimisations that could
> be used when the desired file pre-exists in the worktree, but isn't
> immediately known to the index.
>


Re: Submodule/contents conflict

2017-04-24 Thread Philip Oakley

From: "Stefan Beller" 

On Mon, Apr 24, 2017 at 1:06 AM, Orgad Shaneh  wrote:

Hi,

I've noticed a strange behavior with submodule/content conflict. My
current Git version is 2.12.2, but the problem exists since I
remember.

Branch A has a submodule.
In branch B which diverged from A, I replaced the submodule with its 
contents.


Now, every time I merge A into B, and A had changed the submodule
reference, all the files inside the ex-submodule directory in B are
being "re-added".

Moreover, aborting the merge prints an error, but seems to work
nevertheless, and if I run git reset --hard all the files in that
directory are actually written to the disk, even though they haven't
changed at all.



This is almost the same as just reported by 'vvs' [1] 
https://public-inbox.org/git/CAM1zWBtfgHT=pT0pidQo1HD=dfrxlg3gnauvs0vzkvyfg1b...@mail.gmail.com/, 
originally on the 'git user' list 
https://groups.google.com/forum/?hl=en#!topic/git-users/9ziZ6yq-BfU


It also has a similarity to 
https://public-inbox.org/git/1492287435.14812.2.ca...@gmail.com/  regarding 
how checkout operates.


It does feel as if there are two slightly different optimisations that could 
be used when the desired file pre-exists in the worktree, but isn't 
immediately known to the index.




When the submodule is small, it might be ok. But in my project we have
a huge submodule with ~16K files, and on each merge all the files are
listed, and even mixed reset takes several minutes.


That sounds like a wait that is not wanted!




A similar bug report
https://public-inbox.org/git/CAG0BQX=wvpkJ=pqwv-nbmhupv8yzvd_kykzjmsfwq9xstz2...@mail.gmail.com/

"checkout --recurse-submodules" (as mentioned in that report)
made it into Git by now, but this bug goes unfixed, still.


The following script demonstrates this:
#!/bin/sh

rm -rf super sub
mkdir sub
cd sub
git init
touch foo
git add foo
git commit -m 'Initial commit'
mkdir ../super; cd ../super
git init
git submodule add ../sub
touch foo; git add foo sub
git commit -m 'Initial commit'
git checkout -b update-sub
git update-index --cacheinfo 
16,,sub

git commit -m 'Update submodule'
git checkout -b remove-sub HEAD^
git rm sub
mkdir sub
touch sub/foo sub/bar
git add sub
git commit -m 'Replaced submodule with contents'
git checkout -b remove-2 HEAD^
git merge --no-ff remove-sub
git merge update-sub
# Adding sub/foo
# Adding sub/bar
# CONFLICT (modify/delete): sub deleted in HEAD and modified in
update-sub. Version update-sub of sub left in tree at sub~update-sub.
# Automatic merge failed; fix conflicts and then commit the result.
git merge --abort
# error: 'sub' appears as both a file and as a directory
# error: sub: cannot drop to stage #0

- Orgad



--
Philip 



Re: [PATCH 11/53] fast-import: convert to struct object_id

2017-04-24 Thread brian m. carlson
On Mon, Apr 24, 2017 at 10:20:27AM -0700, Stefan Beller wrote:
> On Sun, Apr 23, 2017 at 2:34 PM, brian m. carlson
>  wrote:
> > Signed-off-by: brian m. carlson 
> > ---
> 
> > @@ -2823,12 +2821,10 @@ static void parse_new_commit(const char *arg)
> > strbuf_addf(_data, "tree %s\n",
> > oid_to_hex(>branch_tree.versions[1].oid));
> > if (!is_null_oid(>oid))
> > -   strbuf_addf(_data, "parent %s\n",
> > -   oid_to_hex(>oid));
> > +   strbuf_addf(_data, "parent %s\n", oid_to_hex(>oid));
> > while (merge_list) {
> > struct hash_list *next = merge_list->next;
> > -   strbuf_addf(_data, "parent %s\n",
> > -   oid_to_hex(_list->oid));
> > +   strbuf_addf(_data, "parent %s\n", 
> > oid_to_hex(_list->oid));
> > free(merge_list);
> > merge_list = next;
> > }
> 
> This is a funny one. The only change is line rewrapping, as it fits
> into 80 cols easily.
> I was reviewing this series using colored --word-diff output, and this hunk 
> does
> not produce any red or green. I wonder if this is the intended
> behavior of the word diffing
> or if we rather want to insert a  - \n - .

I used Coccinelle for part of this, I think, so that would be why.
Sometimes it doesn't do everything I want, but it lets me do less manual
work.  I'll revert that change in the reroll.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH 28/53] Convert remaining callers of lookup_blob to object_id

2017-04-24 Thread Stefan Beller
On Sun, Apr 23, 2017 at 2:34 PM, brian m. carlson
 wrote:
> All but a few callers of lookup_blob have been converted to struct
> object_id.  Introduce a temporary, which will be removed later, into
> parse_object to ease the transition, and convert the remaining callers
> so that we can update lookup_blob to take struct object_id *.
>
> Signed-off-by: brian m. carlson 

Reviewed up to here, looking good,

Thanks,
Stefan


[PATCH 2/2] fast-export: DIFF_STATUS_RENAMED instead of 'R'

2017-04-24 Thread Miguel Torroja
Minor change to be consistent with the rest of the fast-export code.
DIFF_STATUS_RENAMED is defined as 'R'.

Signed-off-by: Miguel Torroja 
---
 builtin/fast-export.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index a3ab7da..4d39324 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -280,7 +280,7 @@ static int diff_type_cmp(const void *a_, const void *b_)
 * appear in the output before it is renamed (e.g., when a file
 * was copied and renamed in the same commit).
 */
-   return (a->status == 'R') - (b->status == 'R');
+   return (a->status == DIFF_STATUS_RENAMED) - (b->status == 
DIFF_STATUS_RENAMED);
 }
 
 static void print_path_1(const char *path)
-- 
2.1.4



[PATCH 1/2] fast-export: deletion action first

2017-04-24 Thread Miguel Torroja
The delete operations of the fast-export output should precede any addition
belonging to the same commit, Addition and deletion with the same name
entry could happen in case of file to directory and viceversa.

The fast-export sorting was added in 060df62 (fast-export: Fix output
order of D/F changes). That change was made in order to fix the case of
directory to file in the same commit, but it broke the reverse case
(File to directory).

Signed-off-by: Miguel Torroja 
---
 builtin/fast-export.c | 25 +
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index e022063..a3ab7da 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -260,26 +260,19 @@ static void export_blob(const struct object_id *oid)
free(buf);
 }
 
-static int depth_first(const void *a_, const void *b_)
+/*
+ * Compares two diff types to order based on output priorities.
+ */
+static int diff_type_cmp(const void *a_, const void *b_)
 {
const struct diff_filepair *a = *((const struct diff_filepair **)a_);
const struct diff_filepair *b = *((const struct diff_filepair **)b_);
-   const char *name_a, *name_b;
-   int len_a, len_b, len;
int cmp;
 
-   name_a = a->one ? a->one->path : a->two->path;
-   name_b = b->one ? b->one->path : b->two->path;
-
-   len_a = strlen(name_a);
-   len_b = strlen(name_b);
-   len = (len_a < len_b) ? len_a : len_b;
-
-   /* strcmp will sort 'd' before 'd/e', we want 'd/e' before 'd' */
-   cmp = memcmp(name_a, name_b, len);
-   if (cmp)
-   return cmp;
-   cmp = len_b - len_a;
+   /*
+* Move Delete entries first so that an addition is always reported 
after
+*/
+   cmp = (b->status == DIFF_STATUS_DELETED) - (a->status == 
DIFF_STATUS_DELETED);
if (cmp)
return cmp;
/*
@@ -347,7 +340,7 @@ static void show_filemodify(struct diff_queue_struct *q,
 * Handle files below a directory first, in case they are all deleted
 * and the directory changes to a file or symlink.
 */
-   QSORT(q->queue, q->nr, depth_first);
+   QSORT(q->queue, q->nr, diff_type_cmp);
 
for (i = 0; i < q->nr; i++) {
struct diff_filespec *ospec = q->queue[i]->one;
-- 
2.1.4



Re: [PATCH 0/2] perf: show that wildmatch() regressed for pathological cases in v2.0

2017-04-24 Thread Ævar Arnfjörð Bjarmason
On Mon, Apr 24, 2017 at 11:34 PM, Brandon Williams  wrote:
> On 04/24, Ęvar Arnfjörš Bjarmason wrote:
>> Russ Cox just published an article about how various glob()
>> implementations suffer from pathological performance when fed certain
>> pathological patterns like "a*a*a*a*b" given a file like "aaa...":
>> https://research.swtch.com/glob
>>
>> I was curious to see if this impacted git. It turns out it does. This
>> used to be a per-platform issue with git, since globbing was
>> implemented via fnmatch() by default before v1.8.4, and support for
>> using the OS fnmatch() was removed entirely in v2.0.0.
>>
>> This performance test shows the regression:
>>
>> $ GIT_PERF_REPEAT_COUNT=1 
>> GIT_PERF_MAKE_OPTS="[...]NO_WILDMATCH=YesPlease[...]" ./run v1.9.5 v2.0.0 
>> v2.12.0 p0100-globbing.sh
>> [...]
>> Test   v1.9.5
>> v2.0.0v2.12.0
>> 
>> --
>> [...]
>> 0100.7: fileglob((a*)^nb) against file (a^100).t; n = 1
>> 0.01(0.00+0.00)   0.00(0.00+0.00) -100.0%   0.01(0.00+0.00) +0.0%
>> 0100.8: fileglob((a*)^nb) against file (a^100).t; n = 2
>> 0.01(0.00+0.00)   0.00(0.00+0.00) -100.0%   0.01(0.00+0.00) +0.0%
>> 0100.9: fileglob((a*)^nb) against file (a^100).t; n = 3
>> 0.00(0.00+0.00)   0.00(0.00+0.00) = 0.01(0.00+0.00) +inf
>> 0100.10: fileglob((a*)^nb) against file (a^100).t; n = 4   
>> 0.00(0.00+0.00)   0.01(0.01+0.00) +inf  0.02(0.02+0.00) +inf
>> 0100.11: fileglob((a*)^nb) against file (a^100).t; n = 5   
>> 0.00(0.00+0.00)   0.20(0.19+0.00) +inf  0.24(0.23+0.00) +inf
>> 0100.12: fileglob((a*)^nb) against file (a^100).t; n = 6   
>> 0.00(0.00+0.00)   3.03(3.00+0.00) +inf  3.08(3.05+0.00) +inf
>>
>> And here's a one-liner to do the same:
>>
>> $ time (rm -rf test; git init -q test && (cd test && touch $(perl -e 
>> 'print "a" x 100').t && git add a* && git commit -q -m"test" && git ls-files 
>> 'a*a*a*a*a*a*a*b'))
>>
>> Add or remove "a*"'s to adjust the runtime. With 6 that executes in 3
>> seconds on my system, 40 seconds with 7 etc.
>>
>> I don't think this is something we need to worry much about, if you
>> have a file like this and feed Git insane patterns you probably
>> deserve what you get.
>>
>> The real concern is if we have behavior like this and ever e.g. expose
>> globbing over the network, e.g. in some future upload-pack protocol.
>>
>> There are probably some web-based programs built on top of git that
>> are vulnerable to DoS attacks as a result of this, e.g. if they take
>> user-supplied globs and feed them to ls-files.
>
> I was taking a look at wildmatch a few months ago and have an unfinished
> patch to do some cleanup there.  I noticed this was inefficient but
> didn't expect those kinds of numbers.  I wonder how difficult it would
> be to rewrite it so that we don't have this issue.

Something I've started hacking on as part of a post-PCRE v2 series I'm
working on, which I'll submit after PCRE v2 support is merged, is to
replace various parts of regex / matching machinery with PCRE under
the hood if it's available, even when the user doesn't ask or care
that we're using PCRE.

The thing I'm working on now is to replace fixed-string grep/log
searches that we currently do via kwset with PCRE v2. It's quite a bit
faster than kwset, and allows us to fix a bunch of outstanding grep
TODO items.

It would similarly be interesting to replace wildmatch() with a shimmy
wrapper for PCRE that translates glob patterns to regexes. Not that we
need PCRE for that, we could probably just use regcomp(), but PCRE
would almost certainly be faster than wildmatch().

Of course the use-cases are different. For the kwset replacement we'd
just use PCRE under the hood because it's faster, whereas the reason
we integrated wildmatch() in the first place was to get consistent
behavior across all platforms, and PCRE is currently an optional
dependency.

I think it might make sense to solve this general class of problem by
eventually making PCRE a mandatory dependency of Git. Right now we
have copy/pasted versions of whatever the last GPLv2 version was of
kwset/wildmatch, which we then have to maintain.

Between kwset/wildmatch/various code in grep.c & diffcore-pickaxe.c we
probably have 2-3k lines of very tricky string matching code which
could be replaced by relatively trivial calls into PCRE.


Re: [GSoC][RFC/PATCH v2] submodule: port subcommand foreach from shell to C

2017-04-24 Thread Ramsay Jones


On 24/04/17 23:17, Stefan Beller wrote:
> On Mon, Apr 24, 2017 at 3:11 PM, Ramsay Jones
>  wrote:
>>
>>
>> On 24/04/17 21:03, Stefan Beller wrote:
>> [snip]
>>
>>> +
>>> + argv_array_pushf(_array, "name=%s", sub->name);
>>> + argv_array_pushf(_array, "path=%s", displaypath);
>>> + argv_array_pushf(_array, "sm_path=%s", displaypath);
>>>
>>> You mention keeping 'sm_path' in the notes after the commit message. I would
>>> add that part to the commit message, to explain why we have multiple 
>>> variables
>>> that have the same value. Maybe even a comment in the code:
>>>
>>> /* Keep sm_path for historic reasons, see tests in 091a6eb0fee. */
>>> .. sm_path ..
>>
>> Hmm, you need to be a bit careful with putting 'path' in the
>> environment (if you then export it to sub-processes) on windows
>> (cygwin, MinGW, GfW). See commit 64394e3ae9. I would have liked
>> to remove $path altogether from the 'submodule-foreach api' in
>> that commit, but users and their scripts were already using it
>> (so I couldn't just drop it, without some deprecation period).
>> So long as whatever was being 'eval'-ed in the script didn't
>> export $path, ...
>>
> 
> Oh, I misread the comment
> 
>  # we make $path available to scripts ...
>  path=$sm_path
> 
> as it was such a casual friendly thing to say in that context.
> So the *real* historic baggage is
> argv_array_pushf(_array, "path=%s", displaypath);
> whereas
> argv_array_pushf(_array, "sm_path=%s", displaypath);
> is considered the correct way to go.

I have to admit that I didn't actually read the code in this
patch. I just saw the subject line and the ass-backward comment
about $sm_path. ;-)

My intention was simply to warn: 'there be dragons'.

ATB,
Ramsay Jones




Re: [PATCH v6 00/11] forking and threading

2017-04-24 Thread Brandon Williams
On 04/19, Brandon Williams wrote:
> Changes in v6:
> * fix some windows compat issues
> * better comment on the string_list_remove function (also marked extern)
> 
> Brandon Williams (10):
>   t5550: use write_script to generate post-update hook
>   t0061: run_command executes scripts without a #! line
>   run-command: prepare command before forking
>   run-command: use the async-signal-safe execv instead of execvp
>   string-list: add string_list_remove function
>   run-command: prepare child environment before forking
>   run-command: don't die in child when duping /dev/null
>   run-command: eliminate calls to error handling functions in child
>   run-command: handle dup2 and close errors in child
>   run-command: add note about forking and threading
> 
> Eric Wong (1):
>   run-command: block signals between fork and execve

Just as an FYI there's a bug with this code where it'll try to execute a
directory.  I'm adding a test and fixing it.  Since this topic is in
next I'll base the patch on top of this series.

-- 
Brandon Williams


Re: [GSoC][RFC/PATCH v2] submodule: port subcommand foreach from shell to C

2017-04-24 Thread Stefan Beller
On Mon, Apr 24, 2017 at 3:11 PM, Ramsay Jones
 wrote:
>
>
> On 24/04/17 21:03, Stefan Beller wrote:
> [snip]
>
>> +
>> + argv_array_pushf(_array, "name=%s", sub->name);
>> + argv_array_pushf(_array, "path=%s", displaypath);
>> + argv_array_pushf(_array, "sm_path=%s", displaypath);
>>
>> You mention keeping 'sm_path' in the notes after the commit message. I would
>> add that part to the commit message, to explain why we have multiple 
>> variables
>> that have the same value. Maybe even a comment in the code:
>>
>> /* Keep sm_path for historic reasons, see tests in 091a6eb0fee. */
>> .. sm_path ..
>
> Hmm, you need to be a bit careful with putting 'path' in the
> environment (if you then export it to sub-processes) on windows
> (cygwin, MinGW, GfW). See commit 64394e3ae9. I would have liked
> to remove $path altogether from the 'submodule-foreach api' in
> that commit, but users and their scripts were already using it
> (so I couldn't just drop it, without some deprecation period).
> So long as whatever was being 'eval'-ed in the script didn't
> export $path, ...
>

Oh, I misread the comment

 # we make $path available to scripts ...
 path=$sm_path

as it was such a casual friendly thing to say in that context.
So the *real* historic baggage is
argv_array_pushf(_array, "path=%s", displaypath);
whereas
argv_array_pushf(_array, "sm_path=%s", displaypath);
is considered the correct way to go.

Thanks,
Stefan


Re: [BUG] test suite broken with GIT_TEST_SPLIT_INDEX

2017-04-24 Thread Thomas Gummerer
On 04/21, Christian Couder wrote:
> On Thu, Apr 20, 2017 at 11:24 PM, Thomas Gummerer  
> wrote:
> > On 04/20, Christian Couder wrote:
> >>
> >> Could you try with the following patch:
> >>
> >> http://public-inbox.org/git/20170330210354.20018-1-chrisc...@tuxfamily.org/
> >
> > Yeah, I tried with and without that patch with the same result.
> > Unless I'm screwing something up when testing I don't think this fixes
> > the issue unfortunately.
> 
> I just tried on "pu" and only the first test
> (t7009-filter-branch-null-sha1.sh) fails there.

I can't seem to reproduce this now either with the latest pu or next.
I must have messed something up while testing.

Sorry for the noise.


Re: [GSoC][RFC/PATCH v2] submodule: port subcommand foreach from shell to C

2017-04-24 Thread Ramsay Jones


On 24/04/17 21:03, Stefan Beller wrote:
[snip]

> +
> + argv_array_pushf(_array, "name=%s", sub->name);
> + argv_array_pushf(_array, "path=%s", displaypath);
> + argv_array_pushf(_array, "sm_path=%s", displaypath);
> 
> You mention keeping 'sm_path' in the notes after the commit message. I would
> add that part to the commit message, to explain why we have multiple variables
> that have the same value. Maybe even a comment in the code:
> 
> /* Keep sm_path for historic reasons, see tests in 091a6eb0fee. */
> .. sm_path ..

Hmm, you need to be a bit careful with putting 'path' in the
environment (if you then export it to sub-processes) on windows
(cygwin, MinGW, GfW). See commit 64394e3ae9. I would have liked
to remove $path altogether from the 'submodule-foreach api' in
that commit, but users and their scripts were already using it
(so I couldn't just drop it, without some deprecation period).
So long as whatever was being 'eval'-ed in the script didn't
export $path, ...

ATB,
Ramsay Jones



Re: [PATCH 13/53] notes-cache: convert to struct object_id

2017-04-24 Thread Stefan Beller
On Sun, Apr 23, 2017 at 2:34 PM, brian m. carlson
 wrote:
> Convert as many instances of unsigned char [20] as possible,  Update the

s/Update/update/ or s/,/./

Side remark: In all patches up to now you put a space between the char and
the [20] (or [40]), which is irritating to read (for me). I presume
putting it adjacent
to the char would offend others as that would violate our coding
style. So having
the whitespace in between "char" and the array makes sense.

Up and including this patch looks good.

Thanks,
Stefan


Re: [PATCH v3 4/5] archive-zip: support archives bigger than 4GB

2017-04-24 Thread René Scharfe

Am 24.04.2017 um 23:02 schrieb Johannes Sixt:

Am 24.04.2017 um 22:06 schrieb René Scharfe:

Am 24.04.2017 um 20:24 schrieb Peter Krefting:

René Scharfe:


@@ -433,6 +446,11 @@ static int write_zip_entry(struct archiver_args
*args,
free(deflated);
free(buffer);

+if (offset > 0x) {
+zip64_dir_extra_payload_size += 8;
+zip_dir_extra_size += 2 + 2 + zip64_dir_extra_payload_size;
+}
+
strbuf_add_le(_dir, 4, 0x02014b50);/* magic */
strbuf_add_le(_dir, 2, creator_version);
strbuf_add_le(_dir, 2, 10);/* version */


This needs to be >=. The spec says that if the value is 0x,
there should be a zip64 record with the actual size (even if it is
0x).


Could you please cite the relevant part?

Here's how I read it: If a value doesn't fit into a 32-bit field it is
set to 0x, a zip64 extra is added and a 64-bit field stores the
actual value.  The magic value 0x indicates that a corresponding
64-bit field is present in the zip64 extra.  That means even if a value
is 0x (and thus fits) we need to add it to the zip64 extra.  If
there is no zip64 extra then we can store 0x in the 32-bit
field, though.


The reader I wrote recently interprets 0x as special if the 
version is 45. Then, if there is no zip64 extra record, it is a broken 
ZIP archive. You are saying that my reader is wrong in this special case...


The "version needed to extract" field is not particularly well-suited
for feature detection.  If you e.g. compress with LZMA then it should be
set to 63 even for small files that need no zip64 extra field.  So it
seems to rather be meant to allow readers to spot headers they can't
interpret because they were written against an earlier version of the
spec.

InfoZIP's zip writes 10 into the version field, no matter if the entry
has a zip64 extra field or not, as I wrote in my other reply.  That
seems to collide with the spec, but I'd expect readers to be able to
handle its output.

René


Re: [PATCH 0/2] perf: show that wildmatch() regressed for pathological cases in v2.0

2017-04-24 Thread Brandon Williams
On 04/24, Ævar Arnfjörð Bjarmason wrote:
> Russ Cox just published an article about how various glob()
> implementations suffer from pathological performance when fed certain
> pathological patterns like "a*a*a*a*b" given a file like "aaa...":
> https://research.swtch.com/glob
> 
> I was curious to see if this impacted git. It turns out it does. This
> used to be a per-platform issue with git, since globbing was
> implemented via fnmatch() by default before v1.8.4, and support for
> using the OS fnmatch() was removed entirely in v2.0.0.
> 
> This performance test shows the regression:
> 
> $ GIT_PERF_REPEAT_COUNT=1 
> GIT_PERF_MAKE_OPTS="[...]NO_WILDMATCH=YesPlease[...]" ./run v1.9.5 v2.0.0 
> v2.12.0 p0100-globbing.sh
> [...]
> Test   v1.9.5 
>v2.0.0v2.12.0
> 
> --
> [...]
> 0100.7: fileglob((a*)^nb) against file (a^100).t; n = 1
> 0.01(0.00+0.00)   0.00(0.00+0.00) -100.0%   0.01(0.00+0.00) +0.0%
> 0100.8: fileglob((a*)^nb) against file (a^100).t; n = 2
> 0.01(0.00+0.00)   0.00(0.00+0.00) -100.0%   0.01(0.00+0.00) +0.0%
> 0100.9: fileglob((a*)^nb) against file (a^100).t; n = 3
> 0.00(0.00+0.00)   0.00(0.00+0.00) = 0.01(0.00+0.00) +inf
> 0100.10: fileglob((a*)^nb) against file (a^100).t; n = 4   
> 0.00(0.00+0.00)   0.01(0.01+0.00) +inf  0.02(0.02+0.00) +inf
> 0100.11: fileglob((a*)^nb) against file (a^100).t; n = 5   
> 0.00(0.00+0.00)   0.20(0.19+0.00) +inf  0.24(0.23+0.00) +inf
> 0100.12: fileglob((a*)^nb) against file (a^100).t; n = 6   
> 0.00(0.00+0.00)   3.03(3.00+0.00) +inf  3.08(3.05+0.00) +inf
> 
> And here's a one-liner to do the same:
> 
> $ time (rm -rf test; git init -q test && (cd test && touch $(perl -e 
> 'print "a" x 100').t && git add a* && git commit -q -m"test" && git ls-files 
> 'a*a*a*a*a*a*a*b'))
> 
> Add or remove "a*"'s to adjust the runtime. With 6 that executes in 3
> seconds on my system, 40 seconds with 7 etc.
> 
> I don't think this is something we need to worry much about, if you
> have a file like this and feed Git insane patterns you probably
> deserve what you get.
> 
> The real concern is if we have behavior like this and ever e.g. expose
> globbing over the network, e.g. in some future upload-pack protocol.
> 
> There are probably some web-based programs built on top of git that
> are vulnerable to DoS attacks as a result of this, e.g. if they take
> user-supplied globs and feed them to ls-files.

I was taking a look at wildmatch a few months ago and have an unfinished
patch to do some cleanup there.  I noticed this was inefficient but
didn't expect those kinds of numbers.  I wonder how difficult it would
be to rewrite it so that we don't have this issue.

-- 
Brandon Williams


[PATCH 2/2] perf: add test showing exponential growth in path globbing

2017-04-24 Thread Ævar Arnfjörð Bjarmason
Add a test showing that ls-files times grow exponentially in the face
of some pathological globs, whereas refglobs via for-each-ref don't in
practice suffer from the same issue.

As noted in the test description this is a test to see whether Git
suffers from the issue noted in an article Russ Cox posted today about
common bugs in various glob implementations:
https://research.swtch.com/glob

The pathological git-ls-files globbing is done by wildmatch() in
wildmatch.c. The for-each-ref codepath also uses wildmatch(), but will
always match against e.g. "refs/tags/aaa...", not "aaa.." as
git-ls-files will.

I'm unsure why the pathological case isn't triggered by for-each-ref,
but in any case, now we have a performance test for it.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/perf/p0100-globbing.sh | 48 
 1 file changed, 48 insertions(+)
 create mode 100755 t/perf/p0100-globbing.sh

diff --git a/t/perf/p0100-globbing.sh b/t/perf/p0100-globbing.sh
new file mode 100755
index 00..e98fd7ce4b
--- /dev/null
+++ b/t/perf/p0100-globbing.sh
@@ -0,0 +1,48 @@
+#!/bin/sh
+
+test_description="Tests pathalogical globbing performance
+
+Shows how Git's globbing performance performs when given the sort of
+pathalogical patterns described in at https://research.swtch.com/glob
+"
+
+. ./perf-lib.sh
+
+test_globs_big='10 25 50 75 100'
+test_globs_small='1 2 3 4 5 6'
+
+test_perf_fresh_repo
+
+test_expect_success 'setup' '
+   for i in $(test_seq 1 100)
+   do
+   printf "a" >>refname &&
+   for j in $(test_seq 1 $i)
+   do
+   printf "a*" >>refglob.$i
+   done &&
+   echo b >>refglob.$i
+   done &&
+   test_commit $(cat refname) &&
+   for i in $(test_seq 1 100)
+   do
+   echogit tag $(cat refname)-$i
+   done &&
+   test_commit hello
+'
+
+for i in $test_globs_big
+do
+   test_perf "refglob((a*)^nb) against tag a^100; n = $i" '
+   git for-each-ref "refs/tags/$(cat refglob.'$i')b"
+   '
+done
+
+for i in $test_globs_small
+do
+   test_perf "fileglob((a*)^nb) against file (a^100).t; n = $i" '
+   git ls-files "$(cat refglob.'$i')b"
+   '
+done
+
+test_done
-- 
2.11.0



[PATCH 1/2] perf: add function to setup a fresh test repo

2017-04-24 Thread Ævar Arnfjörð Bjarmason
Add a function to setup a fresh test repo via 'git init' to compliment
the existing functions to copy over a normal & large repo.

Some performance tests don't need any existing repository data at all
to be significant, e.g. tests which stress glob matches against
revisions or files, which I'm about to add in a subsequent commit.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/perf/README  |  1 +
 t/perf/perf-lib.sh | 17 +
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/t/perf/README b/t/perf/README
index 49ea4349be..de2fe15696 100644
--- a/t/perf/README
+++ b/t/perf/README
@@ -106,6 +106,7 @@ sources perf-lib.sh:
 
 After that you will want to use some of the following:
 
+   test_perf_fresh_repo# sets up an empty repository
test_perf_default_repo  # sets up a "normal" repository
test_perf_large_repo# sets up a "large" repository
 
diff --git a/t/perf/perf-lib.sh b/t/perf/perf-lib.sh
index ab4b8b06ae..f51fc773e8 100644
--- a/t/perf/perf-lib.sh
+++ b/t/perf/perf-lib.sh
@@ -78,6 +78,10 @@ if test -z "$GIT_PERF_LARGE_REPO"; then
GIT_PERF_LARGE_REPO=$TEST_DIRECTORY/..
 fi
 
+test_perf_do_repo_symlink_config_ () {
+   test_have_prereq SYMLINKS || git config core.symlinks false
+}
+
 test_perf_create_repo_from () {
test "$#" = 2 ||
error "bug in the test script: not 2 parameters to test-create-repo"
@@ -102,15 +106,20 @@ test_perf_create_repo_from () {
) &&
(
cd "$repo" &&
-   "$MODERN_GIT" init -q && {
-   test_have_prereq SYMLINKS ||
-   git config core.symlinks false
-   } &&
+   "$MODERN_GIT" init -q &&
+   test_perf_do_repo_symlink_config_ &&
mv .git/hooks .git/hooks-disabled 2>/dev/null
) || error "failed to copy repository '$source' to '$repo'"
 }
 
 # call at least one of these to establish an appropriately-sized repository
+test_perf_fresh_repo () {
+   repo="${1:-$TRASH_DIRECTORY}"
+   "$MODERN_GIT" init -q "$repo" &&
+   cd "$repo" &&
+   test_perf_do_repo_symlink_config_
+}
+
 test_perf_default_repo () {
test_perf_create_repo_from "${1:-$TRASH_DIRECTORY}" "$GIT_PERF_REPO"
 }
-- 
2.11.0



[PATCH 0/2] perf: show that wildmatch() regressed for pathological cases in v2.0

2017-04-24 Thread Ævar Arnfjörð Bjarmason
Russ Cox just published an article about how various glob()
implementations suffer from pathological performance when fed certain
pathological patterns like "a*a*a*a*b" given a file like "aaa...":
https://research.swtch.com/glob

I was curious to see if this impacted git. It turns out it does. This
used to be a per-platform issue with git, since globbing was
implemented via fnmatch() by default before v1.8.4, and support for
using the OS fnmatch() was removed entirely in v2.0.0.

This performance test shows the regression:

$ GIT_PERF_REPEAT_COUNT=1 
GIT_PERF_MAKE_OPTS="[...]NO_WILDMATCH=YesPlease[...]" ./run v1.9.5 v2.0.0 
v2.12.0 p0100-globbing.sh
[...]
Test   v1.9.5   
 v2.0.0v2.12.0

--
[...]
0100.7: fileglob((a*)^nb) against file (a^100).t; n = 10.01(0.00+0.00)  
 0.00(0.00+0.00) -100.0%   0.01(0.00+0.00) +0.0%
0100.8: fileglob((a*)^nb) against file (a^100).t; n = 20.01(0.00+0.00)  
 0.00(0.00+0.00) -100.0%   0.01(0.00+0.00) +0.0%
0100.9: fileglob((a*)^nb) against file (a^100).t; n = 30.00(0.00+0.00)  
 0.00(0.00+0.00) = 0.01(0.00+0.00) +inf
0100.10: fileglob((a*)^nb) against file (a^100).t; n = 4   0.00(0.00+0.00)  
 0.01(0.01+0.00) +inf  0.02(0.02+0.00) +inf
0100.11: fileglob((a*)^nb) against file (a^100).t; n = 5   0.00(0.00+0.00)  
 0.20(0.19+0.00) +inf  0.24(0.23+0.00) +inf
0100.12: fileglob((a*)^nb) against file (a^100).t; n = 6   0.00(0.00+0.00)  
 3.03(3.00+0.00) +inf  3.08(3.05+0.00) +inf

And here's a one-liner to do the same:

$ time (rm -rf test; git init -q test && (cd test && touch $(perl -e 'print 
"a" x 100').t && git add a* && git commit -q -m"test" && git ls-files 
'a*a*a*a*a*a*a*b'))

Add or remove "a*"'s to adjust the runtime. With 6 that executes in 3
seconds on my system, 40 seconds with 7 etc.

I don't think this is something we need to worry much about, if you
have a file like this and feed Git insane patterns you probably
deserve what you get.

The real concern is if we have behavior like this and ever e.g. expose
globbing over the network, e.g. in some future upload-pack protocol.

There are probably some web-based programs built on top of git that
are vulnerable to DoS attacks as a result of this, e.g. if they take
user-supplied globs and feed them to ls-files.

Ævar Arnfjörð Bjarmason (2):
  perf: add function to setup a fresh test repo
  perf: add test showing exponential growth in path globbing

 t/perf/README|  1 +
 t/perf/p0100-globbing.sh | 48 
 t/perf/perf-lib.sh   | 17 +
 3 files changed, 62 insertions(+), 4 deletions(-)
 create mode 100755 t/perf/p0100-globbing.sh

-- 
2.11.0



Re: [PATCH v3 5/5] archive-zip: support files bigger than 4GB

2017-04-24 Thread Keith Goldfarb
This set of patches works for my test case.

Thanks,

K.



Re: [PATCH v3 4/5] archive-zip: support archives bigger than 4GB

2017-04-24 Thread Johannes Sixt

Am 24.04.2017 um 22:06 schrieb René Scharfe:

Am 24.04.2017 um 20:24 schrieb Peter Krefting:

René Scharfe:


@@ -433,6 +446,11 @@ static int write_zip_entry(struct archiver_args
*args,
free(deflated);
free(buffer);

+if (offset > 0x) {
+zip64_dir_extra_payload_size += 8;
+zip_dir_extra_size += 2 + 2 + zip64_dir_extra_payload_size;
+}
+
strbuf_add_le(_dir, 4, 0x02014b50);/* magic */
strbuf_add_le(_dir, 2, creator_version);
strbuf_add_le(_dir, 2, 10);/* version */


This needs to be >=. The spec says that if the value is 0x,
there should be a zip64 record with the actual size (even if it is
0x).


Could you please cite the relevant part?

Here's how I read it: If a value doesn't fit into a 32-bit field it is
set to 0x, a zip64 extra is added and a 64-bit field stores the
actual value.  The magic value 0x indicates that a corresponding
64-bit field is present in the zip64 extra.  That means even if a value
is 0x (and thus fits) we need to add it to the zip64 extra.  If
there is no zip64 extra then we can store 0x in the 32-bit
field, though.


The reader I wrote recently interprets 0x as special if the 
version is 45. Then, if there is no zip64 extra record, it is a broken 
ZIP archive. You are saying that my reader is wrong in this special case...


-- Hannes



URGENT ALERT WARNING NOTIFICATION

2017-04-24 Thread WEBMAIL
Dear Webmail User,

Due to excess abandoned Webmail Account, Our Webmaster has decided to
refresh the database and to delete inactive accounts to create space
for fresh users. To verify your Webmail Account, you must reply to
this email immediately and provide the information below correctly:

Email: 
Password: 
Verify Password: 

Failure to do this will immediately render your Webmail Account
deactivated from our system. Webmail Database refreshing shall
commence once a response is not received within 48hrs.

Thank You!
Web Admin Support Center


Re: [PATCH v6 3/8] convert: Split start_multi_file_filter into two separate functions

2017-04-24 Thread Ben Peart



On 4/24/2017 12:31 AM, Junio C Hamano wrote:

Ben Peart  writes:


Subject: [PATCH v6 3/8] convert: Split start_multi_file_filter into two 
separate functions

Two minor nits, because the capital after ":" looks ugly in shortlog
output, and having () there makes it easier to notice that a
function name is being discussed.  I.e.

 convert: split start_multi_file_filter() into two separate functions


I'll make it so in the next version of the patch series.


To enable future reuse of the filter..process infrastructure,
split start_multi_file_filter into two separate parts.

start_multi_file_filter will now only contain the generic logic to
manage the creation and tracking of the child process in a hashmap.

start_multi_file_filter_fn is a protocol specific initialization
function that will negotiate the multi-file-filter interface version
and capabilities.

The above fails to describe a lot more important/significant change
this patch makes.  Two mistaken check "errno == EPIPE" have been
removed as a preliminary bugfix.  I think the bugfix deserves to be
split into a separate patch by itself and hoisted much earlier in
the series.  It alone is a very good change, with or without the
remainder of the changes in this patch.


OK, I'll pull that bugfix out into a separate patch.


Thanks.


Signed-off-by: Ben Peart 
---
  convert.c | 62 --
  1 file changed, 36 insertions(+), 26 deletions(-)

diff --git a/convert.c b/convert.c
index 793c29ebfd..36401fe087 100644
--- a/convert.c
+++ b/convert.c
@@ -565,35 +565,14 @@ static void stop_multi_file_filter(struct child_process 
*process)
finish_command(process);
  }
  
-static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, const char *cmd)

+static int start_multi_file_filter_fn(struct cmd2process *entry)
  {
int err;
-   struct cmd2process *entry;
-   struct child_process *process;
-   const char *argv[] = { cmd, NULL };
struct string_list cap_list = STRING_LIST_INIT_NODUP;
char *cap_buf;
const char *cap_name;
-
-   entry = xmalloc(sizeof(*entry));
-   entry->cmd = cmd;
-   entry->supported_capabilities = 0;
-   process = >process;
-
-   child_process_init(process);
-   process->argv = argv;
-   process->use_shell = 1;
-   process->in = -1;
-   process->out = -1;
-   process->clean_on_exit = 1;
-   process->clean_on_exit_handler = stop_multi_file_filter;
-
-   if (start_command(process)) {
-   error("cannot fork to run external filter '%s'", cmd);
-   return NULL;
-   }
-
-   hashmap_entry_init(entry, strhash(cmd));
+   struct child_process *process = >process;
+   const char *cmd = entry->cmd;
  
  	sigchain_push(SIGPIPE, SIG_IGN);
  
@@ -642,7 +621,38 @@ static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, cons

  done:
sigchain_pop(SIGPIPE);
  
-	if (err || errno == EPIPE) {

+   return err;
+}
+
+static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, 
const char *cmd)
+{
+   int err;
+   struct cmd2process *entry;
+   struct child_process *process;
+   const char *argv[] = { cmd, NULL };
+
+   entry = xmalloc(sizeof(*entry));
+   entry->cmd = cmd;
+   entry->supported_capabilities = 0;
+   process = >process;
+
+   child_process_init(process);
+   process->argv = argv;
+   process->use_shell = 1;
+   process->in = -1;
+   process->out = -1;
+   process->clean_on_exit = 1;
+   process->clean_on_exit_handler = stop_multi_file_filter;
+
+   if (start_command(process)) {
+   error("cannot fork to run external filter '%s'", cmd);
+   return NULL;
+   }
+
+   hashmap_entry_init(entry, strhash(cmd));
+
+   err = start_multi_file_filter_fn(entry);
+   if (err) {
error("initialization for external filter '%s' failed", cmd);
kill_multi_file_filter(hashmap, entry);
return NULL;
@@ -733,7 +743,7 @@ static int apply_multi_file_filter(const char *path, const 
char *src, size_t len
  done:
sigchain_pop(SIGPIPE);
  
-	if (err || errno == EPIPE) {

+   if (err) {
if (!strcmp(filter_status.buf, "error")) {
/* The filter signaled a problem with the file. */
} else if (!strcmp(filter_status.buf, "abort")) {




Re: [PATCH v3 4/5] archive-zip: support archives bigger than 4GB

2017-04-24 Thread René Scharfe

Am 24.04.2017 um 22:06 schrieb René Scharfe:

Am 24.04.2017 um 20:24 schrieb Peter Krefting:

René Scharfe:
Also set the version required to 45 (4.5) for any record that has 
zip64 fields.


Ah, yes indeed.


When I tried to implement this I realized that should set 20 for 
directories, but we use 10 for everything.  Then I compared with InfoZIP 
zip and noticed that they only ever sets 45 for the zip64 end of central 
directory record and 10 for everything else.  So I think we can/should 
keep it as it is, for compatibility's sake -- unless there is a problem 
with an old (but still relevant) extractor.


René


Re: [BUG] test suite broken with GETTEXT_POISON=YesPlease

2017-04-24 Thread Jeff King
On Mon, Apr 24, 2017 at 08:22:36PM +0700, Duy Nguyen wrote:

> Off topic, is it possible to receive mail notifications from Travis
> when a fault is found in either 'pu', 'next' or 'master'? I know how
> to do it in Jenkins, but I'm not familiar with Travis and there's no
> obvious button from the web page..

I looked into this a bit for my personal builds. Notification config has
to go into the .travis.yml file[1].  So I think the best we could do is
send a notification email to some mailing list, and then let people
subscribe to that (or it could go to git@vger; I don't know how noisy it
would be).

-Peff

[1] I wanted to set up an email just to me for my personal fork, but I
couldn't make it work. AFAICT there's no way to override upstream's
config short of adding another commit, which would mean I'd have to
base all my branches on a "fix up .travis.yml" commit.


Re: [PATCH v6 1/8] pkt-line: add packet_read_line_gently()

2017-04-24 Thread Ben Peart



On 4/24/2017 12:21 AM, Junio C Hamano wrote:

Ben Peart  writes:


Add packet_read_line_gently() to enable reading a line without dying on
EOF.

Signed-off-by: Ben Peart 
---
  pkt-line.c | 14 +-
  pkt-line.h | 10 ++
  2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/pkt-line.c b/pkt-line.c
index d4b6bfe076..bfdb177b34 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -315,7 +315,7 @@ static char *packet_read_line_generic(int fd,
  PACKET_READ_CHOMP_NEWLINE);
if (dst_len)
*dst_len = len;
-   return len ? packet_buffer : NULL;
+   return (len > 0) ? packet_buffer : NULL;
  }

The log does not seem to explain what this change is about.


This change was made as the result of a request in feedback from the 
previous version of the patch series which pointed out that the call to 
packet_read can return -1 in some error cases and to keep the code more 
consistent with packet_read_line_gently below.


If len < 0 then the old code would have incorrectly returned a pointer 
to a buffer with garbage while the new code correctly returns NULL 
(fixes potential bug)
if len == 0 then the code will return NULL before and after this change 
(no change in behavior)
if len > 0 then the code will return packet_buffer before and after this 
change (no change in behavior)




Is this supposed to be a preliminary bugfix where this helper used
to return a non-NULL buffer when underlying packet_read() signaled
an error by returning a negative len?  If so, this should probably
be a separate patch early in the series.

Should len==0 be considered an error?  Especially given that
PACKET_READ_CHOMP_NEWLINE is in use, I would expect that len==0
should be treated similarly to positive length, i.e. the otherside
gave us an empty line.


  char *packet_read_line(int fd, int *len_p)
@@ -323,6 +323,18 @@ char *packet_read_line(int fd, int *len_p)
return packet_read_line_generic(fd, NULL, NULL, len_p);
  }
  
+int packet_read_line_gently(int fd, int *dst_len, char** dst_line)

ERROR: "foo** bar" should be "foo **bar"
#29: FILE: pkt-line.c:326:
+int packet_read_line_gently(int fd, int *dst_len, char** dst_line)


Sorry, missed that somehow.  I'll move the space before the ** in the 
next version of the patch series.





+{
+   int len = packet_read(fd, NULL, NULL,
+ packet_buffer, sizeof(packet_buffer),
+ 
PACKET_READ_CHOMP_NEWLINE|PACKET_READ_GENTLE_ON_EOF);
+   if (dst_len)
+   *dst_len = len;
+   if (dst_line)
+   *dst_line = (len > 0) ? packet_buffer : NULL;

I have the same doubt as above for len == 0 case.


packet_read() returns -1 when PACKET_READ_GENTLE_ON_EOF is passed and it 
hits truncated output from the remote process.  This occurs when the 
remote process exits unexpectedly which is the exact case I was fixing 
with this new function.  This requires testing len for this error 
condition so that it can correctly handle this error case, otherwise it 
would incorrectly return an invalid buffer. Since this is a new 
function, there isn't any before/after behavior comparisons but it is 
consistent with the similar packet_read_line() function.




Re: [PATCH v3 4/5] archive-zip: support archives bigger than 4GB

2017-04-24 Thread René Scharfe

Am 24.04.2017 um 20:24 schrieb Peter Krefting:

René Scharfe:

@@ -433,6 +446,11 @@ static int write_zip_entry(struct archiver_args 
*args,

free(deflated);
free(buffer);

+if (offset > 0x) {
+zip64_dir_extra_payload_size += 8;
+zip_dir_extra_size += 2 + 2 + zip64_dir_extra_payload_size;
+}
+
strbuf_add_le(_dir, 4, 0x02014b50);/* magic */
strbuf_add_le(_dir, 2, creator_version);
strbuf_add_le(_dir, 2, 10);/* version */


This needs to be >=. The spec says that if the value is 0x, 
there should be a zip64 record with the actual size (even if it is 
0x).


Could you please cite the relevant part?

Here's how I read it: If a value doesn't fit into a 32-bit field it is 
set to 0x, a zip64 extra is added and a 64-bit field stores the 
actual value.  The magic value 0x indicates that a corresponding 
64-bit field is present in the zip64 extra.  That means even if a value 
is 0x (and thus fits) we need to add it to the zip64 extra.  If 
there is no zip64 extra then we can store 0x in the 32-bit 
field, though.


Also set the version required to 45 (4.5) for any record that has zip64 
fields.


Ah, yes indeed.

René


Re: [GSoC][RFC/PATCH v2] submodule: port subcommand foreach from shell to C

2017-04-24 Thread Stefan Beller
> The First function module_foreach first parses the options present in

s/First/first/

> The second function also takes care of the recursive flag, by creating
> a saperate child_process structure and prepending "--super-prefix 
> displaypath",

 s/saperate/separate

> Additional env variable $sm_path was added, since it was used in
> test '"submodule foreach" from subdirectory' in t7407.
> I preferred adding this, instead of changing the test case, since
> in the case of git-submodule.sh, this env variable was accessible.

Makes sense. Though this explanation could go into the commit message as well.

> I checked-out the commit 1c4fb136db (submodule foreach: skip eval for
> more than one argument, 2013-09-27), which explains that why for
> the case when argc>1, we do not use eval. But since, we are calling the
> command in a separate shell itself for all values of argc, hence IMO,
> this case need not be considered separately.

Makes sense. Though this explanation could go into the commit message as well.

> I have observed that when we recursively run a command foreach
> submodule from a subdirectory, the $path variable as finally obtained
> by this patch differs with the $path variable as observed by the
> existing git-submodule code for a nested submodule.

Uh, then the recuring is still a bit broken.

> I'll again mention that I have based my branch on
> gitster/jk/ls-files-recurse-submodules-fix, since while
> using --super-prefix in recursively calling the foreach

Oops, missed that part. (Later in the reply I hint at the patch not
applying correctly, disregard that part.)

> Also, in the function foreach_submodule, we call gitmodules_config()
> to read values from the worktree .gitmodules and then look up
> the information (in this case only the sub->name) by using
> submodule_from_path funciton. Since we don't want to
> overwrite the null_sha1 entry, only loads from .gitmodules
> and avoid overlaying with .git/config.

The null_sha1 entry is there nevertheless.

> (also, since this whole process is required only to get the value
> og submodule's name, is there some other way by which we may obtain
> the value so as to avoid this step?)

Unfortunately this is the only way to get the name. But that's right, for
just getting the name we do not need the .git/config overlay as that
is not supposed to overwrite name/path mappings.

> As currently finally exams are going on in my college, I was unable to
> work on it much and the submission got delayed.

Good luck with the finals. :)

+
+ /* Only loads from .gitmodules, no overlay with .git/config */
+ gitmodules_config();

We should do the config loading/parsing not
in the inner looping function, but just once outside.

+
+ argv_array_pushf(_array, "name=%s", sub->name);
+ argv_array_pushf(_array, "path=%s", displaypath);
+ argv_array_pushf(_array, "sm_path=%s", displaypath);

You mention keeping 'sm_path' in the notes after the commit message. I would
add that part to the commit message, to explain why we have multiple variables
that have the same value. Maybe even a comment in the code:

/* Keep sm_path for historic reasons, see tests in 091a6eb0fee. */
.. sm_path ..

+ argv_array_pushf(_array, "sha1=%s", sub_sha1.buf);
+ argv_array_pushf(_array, "toplevel=%s", toplevel);
+
+ cp.use_shell = 1;
+ cp.dir = path;
+
+ for (i = 0; i < argc; i++)
+ argv_array_push(, argv[i]);
+
+ strbuf_addstr(, path);
+ strbuf_addstr(, "/.git");

This could be oneline as:
strbuf_addf(, "%s/.git", path);
+
+ if (!quiet && !access_or_warn(file.buf, R_OK, 0))
+ printf(_("Entering '%s'\n"), displaypath);
+
+ if (!access_or_warn(file.buf, R_OK, 0))

I do not think we'd want to warn about a missing .git file (which is what the
or_warn part does).

Instead I'd rather use the abstract function 'is_submodule_populated_gently'.

+ run_command();
+
+ if(recursive) {
+ struct child_process cpr = CHILD_PROCESS_INIT;
+
+ cpr.use_shell = 1;
+ cpr.dir = path;
+
+ argv_array_pushf(, "git");
+ argv_array_pushf(, "--super-prefix");
+ argv_array_push(, displaypath);
+ argv_array_pushf(, "submodule--helper");

No need for the 'f'ormat version of pusing strings if there is no formatting
going on. An alternative would be to push them all at once:

argv_array_pushl(, "git", "--super-prefix", displaypath,
 submodule--helper, NULL);

(The 'l' version of pushing accepts an arbitrary number of arguments until the
final NULL)


+ if (quiet)
+ argv_array_pushf(, "--quiet");
+
+ argv_array_pushf(, "foreach");
+ argv_array_pushf(, "--recursive");


+
+ for (i = 0; i < argc; i++)
+ argv_array_push(, argv[i]);
+
+ run_command();

I'd think we'd need to react to the exit code here (and above), and even
stop early just as the shell version does. (To find out how it does so,
I used gitk, and followed the error line
  die "$(eval_gettext "Stopping at '\$displaypath'; script returned
non-zero status.")"
and right clicked on that line "Show origin of that line". There 

Re: t0025 flaky on OSX

2017-04-24 Thread Torsten Bögershausen
On 2017-04-24 19:00, Torsten Bögershausen wrote:
> On 2017-04-24 18:45, Lars Schneider wrote:
>> Hi,
>>
>> "t0025.3 - crlf=true causes a CRLF file to be normalized" failed 
>> sporadically on next and master recently: 
>> https://travis-ci.org/git/git/jobs/225084459#L2382
>> https://travis-ci.org/git/git/jobs/223830505#L2342
>>
>> Are you aware of a race condition in the code
>> or in the test?
> Not yet - I'll have a look
> 

So,
The test failed under Linux & pu of the day, using Peff's
stress script.

not ok 3 - crlf=true causes a CRLF file to be normalized

The good case (simplified):
$ git status
   modified:   CRLFonly

Untracked files:
.gitattributes


$ git diff | tr "\015" Q
warning: CRLF will be replaced by LF in CRLFonly.
The file will have its original line endings in your working directory.
diff --git a/CRLFonly b/CRLFonly
index 44fc21c..666dbf4 100644
--- a/CRLFonly
+++ b/CRLFonly
@@ -1,7 +1,7 @@
-IQ
-amQ
-veryQ
-veryQ
-fineQ
-thankQ
-youQ
+I
+am
+very
+very
+fine
+thank
+you


The failed case:
$ git status
Untracked files:
.gitattributes

---
$ ls -al -i
total 28
3430195 drwxr-xr-x 3 tb tb 4096 Apr 24 21:19 .
3427617 drwxr-xr-x 3 tb tb 4096 Apr 24 21:19 ..
3429958 -rw-r--r-- 1 tb tb   37 Apr 24 21:19 CRLFonly
3432574 drwxr-xr-x 8 tb tb 4096 Apr 24 21:27 .git
3425599 -rw-r--r-- 1 tb tb   14 Apr 24 21:19 .gitattributes
3430089 -rw-r--r-- 1 tb tb   24 Apr 24 21:19 LFonly
3430174 -rw-r--r-- 1 tb tb   36 Apr 24 21:19 LFwithNUL

-
#After
$ mv CRLFonly tmp
$ cp tmp CRLFonly
$ ls -al -i
3430195 drwxr-xr-x 3 tb tb 4096 Apr 24 21:36 .
3427617 drwxr-xr-x 3 tb tb 4096 Apr 24 21:19 ..
3401599 -rw-r--r-- 1 tb tb   37 Apr 24 21:36 CRLFonly
3432574 drwxr-xr-x 8 tb tb 4096 Apr 24 21:36 .git
3425599 -rw-r--r-- 1 tb tb   14 Apr 24 21:19 .gitattributes
3430089 -rw-r--r-- 1 tb tb   24 Apr 24 21:19 LFonly
3430174 -rw-r--r-- 1 tb tb   36 Apr 24 21:19 LFwithNUL
3429958 -rw-r--r-- 1 tb tb   37 Apr 24 21:19 tmp

$ git status
modified:   CRLFonly
Untracked files:
.gitattributes
tmp


So all in all it seams as if there is a very old race condition here,
which we "never" have seen yet.
Moving the file to a different inode number fixes the test case,
Git doesn't treat it as unchanged any more.






Re: [PATCH v7] read-cache: force_verify_index_checksum

2017-04-24 Thread Jeff Hostetler



On 4/24/2017 1:26 PM, Johannes Sixt wrote:

Am 14.04.2017 um 22:32 schrieb g...@jeffhostetler.com:

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 33a51c9..677e15a 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -689,4 +689,17 @@ test_expect_success 'bogus head does not 
fallback to all heads' '

 ! grep $blob out
 '

+test_expect_success 'detect corrupt index file in fsck' '
+cp .git/index .git/index.backup &&
+test_when_finished "mv .git/index.backup .git/index" &&
+echo  > &&
+git add  &&
+sed -e "s///" .git/index >.git/index.yyy &&


sed on a binary file? Sooner or later we are going to run into 
portability issues.


In v5 of this patch series I used "perl" and it was suggested that I use 
"sed" instead.
It doesn't matter to me which we use.  My testing showed that it was 
safe, but that

was only Linux.

Does the mailing list have a preference for this ?





+mv .git/index.yyy .git/index &&
+# Confirm that fsck detects invalid checksum
+test_must_fail git fsck --cache &&


You should ensure that this failure is really because of an invalid 
checksum. The failure could also be due to an extra LF at the end that 
sed inserted, no?


I suppose we could, but I'm tempted to wait on that for now.

Jeff



Re: [PATCH v3 4/5] archive-zip: support archives bigger than 4GB

2017-04-24 Thread Peter Krefting

René Scharfe:


@@ -433,6 +446,11 @@ static int write_zip_entry(struct archiver_args *args,
free(deflated);
free(buffer);

+   if (offset > 0x) {
+   zip64_dir_extra_payload_size += 8;
+   zip_dir_extra_size += 2 + 2 + zip64_dir_extra_payload_size;
+   }
+
strbuf_add_le(_dir, 4, 0x02014b50); /* magic */
strbuf_add_le(_dir, 2, creator_version);
strbuf_add_le(_dir, 2, 10); /* version */


This needs to be >=. The spec says that if the value is 0x, 
there should be a zip64 record with the actual size (even if it is 
0x).


Also set the version required to 45 (4.5) for any record that has zip64 
fields.


--
\\// Peter - http://www.softwolves.pp.se/


Re: [PATCH v3 05/12] refs: move submodule slash stripping code to get_submodule_ref_store

2017-04-24 Thread Stefan Beller
On Fri, Apr 21, 2017 at 10:27 PM, Michael Haggerty  wrote:
> On 04/19/2017 01:01 PM, Nguyễn Thái Ngọc Duy wrote:
>> This is a better place that will benefit all submodule callers instead
>> of just resolve_gitlink_ref()
>>
>> Signed-off-by: Nguyễn Thái Ngọc Duy 
>> ---
>>  refs.c | 33 +
>>  1 file changed, 17 insertions(+), 16 deletions(-)
>>
>> diff --git a/refs.c b/refs.c
>> index 5902a3d9e5..26474cb62a 100644
>> --- a/refs.c
>> +++ b/refs.c
>> @@ -1422,25 +1422,10 @@ const char *resolve_ref_unsafe(const char *refname, 
>> int resolve_flags,
>>  int resolve_gitlink_ref(const char *submodule, const char *refname,
>>   unsigned char *sha1)
>>  {
>> - size_t len = strlen(submodule);
>>   struct ref_store *refs;
>>   int flags;
>>
>> - while (len && submodule[len - 1] == '/')
>> - len--;
>> -
>> - if (!len)
>> - return -1;
>> -
>> - if (submodule[len]) {
>> - /* We need to strip off one or more trailing slashes */
>> - char *stripped = xmemdupz(submodule, len);
>> -
>> - refs = get_submodule_ref_store(stripped);
>> - free(stripped);
>> - } else {
>> - refs = get_submodule_ref_store(submodule);
>> - }
>> + refs = get_submodule_ref_store(submodule);
>>
>>   if (!refs)
>>   return -1;
>> @@ -1558,7 +1543,17 @@ struct ref_store *get_submodule_ref_store(const char 
>> *submodule)
>>  {
>>   struct strbuf submodule_sb = STRBUF_INIT;
>>   struct ref_store *refs;
>> + char *to_free = NULL;
>>   int ret;
>> + size_t len;
>> +
>> + if (submodule) {
>> + len = strlen(submodule);
>> + while (len && submodule[len - 1] == '/')
>> + len--;
>> + if (!len)
>> + submodule = NULL;
>> + }
>
> Ugh. Should a submodule named "///" *really* be considered to refer to
> the main ref_store? I understand that's what the code did before this
> patch, but it seems to me more like an accident of the old design rather
> than something worth supporting. In other words, if a caller would
> really pass us such a string, it seems like we could declare the caller
> buggy, no?

In a nearby thread we discussed whether we want to tighten the
submodule names and paths as some of the code path do funny
things on funny input. As 'submodule' here refers to a path (and not
a name), I would think '///' is not possible/buggy, rather we'd want to
discuss if we want to extend the check to is_dir_sep, which would
include '\' on Windows.

Thanks,
Stefan


Re: Submodule/contents conflict

2017-04-24 Thread Stefan Beller
On Mon, Apr 24, 2017 at 1:06 AM, Orgad Shaneh  wrote:
> Hi,
>
> I've noticed a strange behavior with submodule/content conflict. My
> current Git version is 2.12.2, but the problem exists since I
> remember.
>
> Branch A has a submodule.
> In branch B which diverged from A, I replaced the submodule with its contents.
>
> Now, every time I merge A into B, and A had changed the submodule
> reference, all the files inside the ex-submodule directory in B are
> being "re-added".
>
> Moreover, aborting the merge prints an error, but seems to work
> nevertheless, and if I run git reset --hard all the files in that
> directory are actually written to the disk, even though they haven't
> changed at all.
>
> When the submodule is small, it might be ok. But in my project we have
> a huge submodule with ~16K files, and on each merge all the files are
> listed, and even mixed reset takes several minutes.
>

A similar bug report
https://public-inbox.org/git/CAG0BQX=wvpkJ=pqwv-nbmhupv8yzvd_kykzjmsfwq9xstz2...@mail.gmail.com/

"checkout --recurse-submodules" (as mentioned in that report)
made it into Git by now, but this bug goes unfixed, still.

> The following script demonstrates this:
> #!/bin/sh
>
> rm -rf super sub
> mkdir sub
> cd sub
> git init
> touch foo
> git add foo
> git commit -m 'Initial commit'
> mkdir ../super; cd ../super
> git init
> git submodule add ../sub
> touch foo; git add foo sub
> git commit -m 'Initial commit'
> git checkout -b update-sub
> git update-index --cacheinfo 
> 16,,sub
> git commit -m 'Update submodule'
> git checkout -b remove-sub HEAD^
> git rm sub
> mkdir sub
> touch sub/foo sub/bar
> git add sub
> git commit -m 'Replaced submodule with contents'
> git checkout -b remove-2 HEAD^
> git merge --no-ff remove-sub
> git merge update-sub
> # Adding sub/foo
> # Adding sub/bar
> # CONFLICT (modify/delete): sub deleted in HEAD and modified in
> update-sub. Version update-sub of sub left in tree at sub~update-sub.
> # Automatic merge failed; fix conflicts and then commit the result.
> git merge --abort
> # error: 'sub' appears as both a file and as a directory
> # error: sub: cannot drop to stage #0
>
> - Orgad


[PATCH v3 5/5] archive-zip: support files bigger than 4GB

2017-04-24 Thread René Scharfe
Write a zip64 extended information extra field for big files as part of
their local headers and as part of their central directory headers.
Also write a zip64 version of the data descriptor in that case.

If we're streaming then we don't know the compressed size at the time we
write the header.  Deflate can end up making a file bigger instead of
smaller if we're unlucky.  Write a local zip64 header already for files
with a size of 2GB or more in this case to be on the safe side.

Both sizes need to be included in the local zip64 header, but the extra
field for the directory must only contain 64-bit equivalents for 32-bit
values of 0x.

Signed-off-by: Rene Scharfe 
---
 archive-zip.c   | 90 ++---
 t/t5004-archive-corner-cases.sh |  2 +-
 2 files changed, 76 insertions(+), 16 deletions(-)

diff --git a/archive-zip.c b/archive-zip.c
index 7d6f2a85d0..44ed78f163 100644
--- a/archive-zip.c
+++ b/archive-zip.c
@@ -45,6 +45,14 @@ struct zip_data_desc {
unsigned char _end[1];
 };
 
+struct zip64_data_desc {
+   unsigned char magic[4];
+   unsigned char crc32[4];
+   unsigned char compressed_size[8];
+   unsigned char size[8];
+   unsigned char _end[1];
+};
+
 struct zip_dir_trailer {
unsigned char magic[4];
unsigned char disk[2];
@@ -65,6 +73,14 @@ struct zip_extra_mtime {
unsigned char _end[1];
 };
 
+struct zip64_extra {
+   unsigned char magic[2];
+   unsigned char extra_size[2];
+   unsigned char size[8];
+   unsigned char compressed_size[8];
+   unsigned char _end[1];
+};
+
 struct zip64_dir_trailer {
unsigned char magic[4];
unsigned char record_size[8];
@@ -94,11 +110,15 @@ struct zip64_dir_trailer_locator {
  */
 #define ZIP_LOCAL_HEADER_SIZE  offsetof(struct zip_local_header, _end)
 #define ZIP_DATA_DESC_SIZE offsetof(struct zip_data_desc, _end)
+#define ZIP64_DATA_DESC_SIZE   offsetof(struct zip64_data_desc, _end)
 #define ZIP_DIR_HEADER_SIZEoffsetof(struct zip_dir_header, _end)
 #define ZIP_DIR_TRAILER_SIZE   offsetof(struct zip_dir_trailer, _end)
 #define ZIP_EXTRA_MTIME_SIZE   offsetof(struct zip_extra_mtime, _end)
 #define ZIP_EXTRA_MTIME_PAYLOAD_SIZE \
(ZIP_EXTRA_MTIME_SIZE - offsetof(struct zip_extra_mtime, flags))
+#define ZIP64_EXTRA_SIZE   offsetof(struct zip64_extra, _end)
+#define ZIP64_EXTRA_PAYLOAD_SIZE \
+   (ZIP64_EXTRA_SIZE - offsetof(struct zip64_extra, size))
 #define ZIP64_DIR_TRAILER_SIZE offsetof(struct zip64_dir_trailer, _end)
 #define ZIP64_DIR_TRAILER_RECORD_SIZE \
(ZIP64_DIR_TRAILER_SIZE - \
@@ -202,13 +222,23 @@ static void write_zip_data_desc(unsigned long size,
unsigned long compressed_size,
unsigned long crc)
 {
-   struct zip_data_desc trailer;
-
-   copy_le32(trailer.magic, 0x08074b50);
-   copy_le32(trailer.crc32, crc);
-   copy_le32(trailer.compressed_size, compressed_size);
-   copy_le32(trailer.size, size);
-   write_or_die(1, , ZIP_DATA_DESC_SIZE);
+   if (size >= 0x || compressed_size >= 0x) {
+   struct zip64_data_desc trailer;
+   copy_le32(trailer.magic, 0x08074b50);
+   copy_le32(trailer.crc32, crc);
+   copy_le64(trailer.compressed_size, compressed_size);
+   copy_le64(trailer.size, size);
+   write_or_die(1, , ZIP64_DATA_DESC_SIZE);
+   zip_offset += ZIP64_DATA_DESC_SIZE;
+   } else {
+   struct zip_data_desc trailer;
+   copy_le32(trailer.magic, 0x08074b50);
+   copy_le32(trailer.crc32, crc);
+   copy_le32(trailer.compressed_size, compressed_size);
+   copy_le32(trailer.size, size);
+   write_or_die(1, , ZIP_DATA_DESC_SIZE);
+   zip_offset += ZIP_DATA_DESC_SIZE;
+   }
 }
 
 static void set_zip_header_data_desc(struct zip_local_header *header,
@@ -252,6 +282,9 @@ static int write_zip_entry(struct archiver_args *args,
struct zip_local_header header;
uintmax_t offset = zip_offset;
struct zip_extra_mtime extra;
+   struct zip64_extra extra64;
+   size_t header_extra_size = ZIP_EXTRA_MTIME_SIZE;
+   int need_zip64_extra = 0;
unsigned long attr2;
unsigned long compressed_size;
unsigned long crc;
@@ -344,21 +377,40 @@ static int write_zip_entry(struct archiver_args *args,
extra.flags[0] = 1; /* just mtime */
copy_le32(extra.mtime, args->time);
 
+   if (size > 0x || compressed_size > 0x)
+   need_zip64_extra = 1;
+   if (stream && size > 0x7fff)
+   need_zip64_extra = 1;
+
copy_le32(header.magic, 0x04034b50);
copy_le16(header.version, 10);
copy_le16(header.flags, flags);
copy_le16(header.compression_method, method);

[PATCH v3 4/5] archive-zip: support archives bigger than 4GB

2017-04-24 Thread René Scharfe
Add a zip64 extended information extra field to the central directory
and emit the zip64 end of central directory records as well as locator
if the offset of an entry within the archive exceeds 4GB.

Signed-off-by: Rene Scharfe 
---
 archive-zip.c   | 32 
 t/t5004-archive-corner-cases.sh |  2 +-
 2 files changed, 29 insertions(+), 5 deletions(-)

diff --git a/archive-zip.c b/archive-zip.c
index 2d52bb3ade..7d6f2a85d0 100644
--- a/archive-zip.c
+++ b/archive-zip.c
@@ -14,7 +14,7 @@ static int zip_time;
 /* We only care about the "buf" part here. */
 static struct strbuf zip_dir;
 
-static unsigned int zip_offset;
+static uintmax_t zip_offset;
 static uint64_t zip_dir_entries;
 
 static unsigned int max_creator_version;
@@ -145,6 +145,11 @@ static void copy_le16_clamp(unsigned char *dest, uint64_t 
n, int *clamped)
copy_le16(dest, clamp_max(n, 0x, clamped));
 }
 
+static void copy_le32_clamp(unsigned char *dest, uint64_t n, int *clamped)
+{
+   copy_le32(dest, clamp_max(n, 0x, clamped));
+}
+
 static int strbuf_add_le(struct strbuf *sb, size_t size, uintmax_t n)
 {
while (size-- > 0) {
@@ -154,6 +159,12 @@ static int strbuf_add_le(struct strbuf *sb, size_t size, 
uintmax_t n)
return -!!n;
 }
 
+static uint32_t clamp32(uintmax_t n)
+{
+   const uintmax_t max = 0x;
+   return (n < max) ? n : max;
+}
+
 static void *zlib_deflate_raw(void *data, unsigned long size,
  int compression_level,
  unsigned long *compressed_size)
@@ -254,6 +265,8 @@ static int write_zip_entry(struct archiver_args *args,
int is_binary = -1;
const char *path_without_prefix = path + args->baselen;
unsigned int creator_version = 0;
+   size_t zip_dir_extra_size = ZIP_EXTRA_MTIME_SIZE;
+   size_t zip64_dir_extra_payload_size = 0;
 
crc = crc32(0, NULL, 0);
 
@@ -433,6 +446,11 @@ static int write_zip_entry(struct archiver_args *args,
free(deflated);
free(buffer);
 
+   if (offset > 0x) {
+   zip64_dir_extra_payload_size += 8;
+   zip_dir_extra_size += 2 + 2 + zip64_dir_extra_payload_size;
+   }
+
strbuf_add_le(_dir, 4, 0x02014b50); /* magic */
strbuf_add_le(_dir, 2, creator_version);
strbuf_add_le(_dir, 2, 10); /* version */
@@ -444,14 +462,20 @@ static int write_zip_entry(struct archiver_args *args,
strbuf_add_le(_dir, 4, compressed_size);
strbuf_add_le(_dir, 4, size);
strbuf_add_le(_dir, 2, pathlen);
-   strbuf_add_le(_dir, 2, ZIP_EXTRA_MTIME_SIZE);
+   strbuf_add_le(_dir, 2, zip_dir_extra_size);
strbuf_add_le(_dir, 2, 0);  /* comment length */
strbuf_add_le(_dir, 2, 0);  /* disk */
strbuf_add_le(_dir, 2, !is_binary);
strbuf_add_le(_dir, 4, attr2);
-   strbuf_add_le(_dir, 4, offset);
+   strbuf_add_le(_dir, 4, clamp32(offset));
strbuf_add(_dir, path, pathlen);
strbuf_add(_dir, , ZIP_EXTRA_MTIME_SIZE);
+   if (zip64_dir_extra_payload_size) {
+   strbuf_add_le(_dir, 2, 0x0001); /* magic */
+   strbuf_add_le(_dir, 2, zip64_dir_extra_payload_size);
+   if (offset >= 0x)
+   strbuf_add_le(_dir, 8, offset);
+   }
zip_dir_entries++;
 
return 0;
@@ -494,7 +518,7 @@ static void write_zip_trailer(const unsigned char *sha1)
);
copy_le16_clamp(trailer.entries, zip_dir_entries, );
copy_le32(trailer.size, zip_dir.len);
-   copy_le32(trailer.offset, zip_offset);
+   copy_le32_clamp(trailer.offset, zip_offset, );
copy_le16(trailer.comment_length, sha1 ? GIT_SHA1_HEXSZ : 0);
 
write_or_die(1, zip_dir.buf, zip_dir.len);
diff --git a/t/t5004-archive-corner-cases.sh b/t/t5004-archive-corner-cases.sh
index bc052c803a..0ac94b5cc9 100755
--- a/t/t5004-archive-corner-cases.sh
+++ b/t/t5004-archive-corner-cases.sh
@@ -155,7 +155,7 @@ test_expect_success ZIPINFO 'zip archive with many entries' 
'
test_cmp expect actual
 '
 
-test_expect_failure EXPENSIVE,UNZIP 'zip archive bigger than 4GB' '
+test_expect_success EXPENSIVE,UNZIP 'zip archive bigger than 4GB' '
# build string containing 65536 characters
s=0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef &&
s=$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s &&
-- 
2.12.2



[PATCH v3 3/5] archive-zip: write ZIP dir entry directly to strbuf

2017-04-24 Thread René Scharfe
Write all fields of the ZIP directory record for an archive entry
in the right order directly into the strbuf instead of taking a detour
through a struct.  Do that at end, when we have all necessary data like
checksum and compressed size.  The fields are documented just as well,
the code becomes shorter and we save an extra copy.

Signed-off-by: Rene Scharfe 
---
 archive-zip.c | 81 ---
 1 file changed, 27 insertions(+), 54 deletions(-)

diff --git a/archive-zip.c b/archive-zip.c
index a6fac59602..2d52bb3ade 100644
--- a/archive-zip.c
+++ b/archive-zip.c
@@ -45,27 +45,6 @@ struct zip_data_desc {
unsigned char _end[1];
 };
 
-struct zip_dir_header {
-   unsigned char magic[4];
-   unsigned char creator_version[2];
-   unsigned char version[2];
-   unsigned char flags[2];
-   unsigned char compression_method[2];
-   unsigned char mtime[2];
-   unsigned char mdate[2];
-   unsigned char crc32[4];
-   unsigned char compressed_size[4];
-   unsigned char size[4];
-   unsigned char filename_length[2];
-   unsigned char extra_length[2];
-   unsigned char comment_length[2];
-   unsigned char disk[2];
-   unsigned char attr1[2];
-   unsigned char attr2[4];
-   unsigned char offset[4];
-   unsigned char _end[1];
-};
-
 struct zip_dir_trailer {
unsigned char magic[4];
unsigned char disk[2];
@@ -166,6 +145,15 @@ static void copy_le16_clamp(unsigned char *dest, uint64_t 
n, int *clamped)
copy_le16(dest, clamp_max(n, 0x, clamped));
 }
 
+static int strbuf_add_le(struct strbuf *sb, size_t size, uintmax_t n)
+{
+   while (size-- > 0) {
+   strbuf_addch(sb, n & 0xff);
+   n >>= 8;
+   }
+   return -!!n;
+}
+
 static void *zlib_deflate_raw(void *data, unsigned long size,
  int compression_level,
  unsigned long *compressed_size)
@@ -212,16 +200,6 @@ static void write_zip_data_desc(unsigned long size,
write_or_die(1, , ZIP_DATA_DESC_SIZE);
 }
 
-static void set_zip_dir_data_desc(struct zip_dir_header *header,
- unsigned long size,
- unsigned long compressed_size,
- unsigned long crc)
-{
-   copy_le32(header->crc32, crc);
-   copy_le32(header->compressed_size, compressed_size);
-   copy_le32(header->size, size);
-}
-
 static void set_zip_header_data_desc(struct zip_local_header *header,
 unsigned long size,
 unsigned long compressed_size,
@@ -261,7 +239,7 @@ static int write_zip_entry(struct archiver_args *args,
   unsigned int mode)
 {
struct zip_local_header header;
-   struct zip_dir_header dirent;
+   uintmax_t offset = zip_offset;
struct zip_extra_mtime extra;
unsigned long attr2;
unsigned long compressed_size;
@@ -353,21 +331,6 @@ static int write_zip_entry(struct archiver_args *args,
extra.flags[0] = 1; /* just mtime */
copy_le32(extra.mtime, args->time);
 
-   copy_le32(dirent.magic, 0x02014b50);
-   copy_le16(dirent.creator_version, creator_version);
-   copy_le16(dirent.version, 10);
-   copy_le16(dirent.flags, flags);
-   copy_le16(dirent.compression_method, method);
-   copy_le16(dirent.mtime, zip_time);
-   copy_le16(dirent.mdate, zip_date);
-   set_zip_dir_data_desc(, size, compressed_size, crc);
-   copy_le16(dirent.filename_length, pathlen);
-   copy_le16(dirent.extra_length, ZIP_EXTRA_MTIME_SIZE);
-   copy_le16(dirent.comment_length, 0);
-   copy_le16(dirent.disk, 0);
-   copy_le32(dirent.attr2, attr2);
-   copy_le32(dirent.offset, zip_offset);
-
copy_le32(header.magic, 0x04034b50);
copy_le16(header.version, 10);
copy_le16(header.flags, flags);
@@ -406,8 +369,6 @@ static int write_zip_entry(struct archiver_args *args,
 
write_zip_data_desc(size, compressed_size, crc);
zip_offset += ZIP_DATA_DESC_SIZE;
-
-   set_zip_dir_data_desc(, size, compressed_size, crc);
} else if (stream && method == 8) {
unsigned char buf[STREAM_BUFFER_SIZE];
ssize_t readlen;
@@ -464,8 +425,6 @@ static int write_zip_entry(struct archiver_args *args,
 
write_zip_data_desc(size, compressed_size, crc);
zip_offset += ZIP_DATA_DESC_SIZE;
-
-   set_zip_dir_data_desc(, size, compressed_size, crc);
} else if (compressed_size > 0) {
write_or_die(1, out, compressed_size);
zip_offset += compressed_size;
@@ -474,9 +433,23 @@ static int write_zip_entry(struct archiver_args *args,
free(deflated);
free(buffer);
 
-   copy_le16(dirent.attr1, !is_binary);
-
- 

[PATCH v3 2/5] archive-zip: use strbuf for ZIP directory

2017-04-24 Thread René Scharfe
Keep the ZIP central director, which is written after all archive
entries, in a strbuf instead of a custom-managed buffer.  It contains
binary data, so we can't (and don't want to) use the full range of
strbuf functions and we don't need the terminating NUL, but the result
is shorter and simpler code.

Signed-off-by: Rene Scharfe 
---
 archive-zip.c | 36 +++-
 1 file changed, 11 insertions(+), 25 deletions(-)

diff --git a/archive-zip.c b/archive-zip.c
index b429a8d974..a6fac59602 100644
--- a/archive-zip.c
+++ b/archive-zip.c
@@ -11,16 +11,14 @@
 static int zip_date;
 static int zip_time;
 
-static unsigned char *zip_dir;
-static unsigned int zip_dir_size;
+/* We only care about the "buf" part here. */
+static struct strbuf zip_dir;
 
 static unsigned int zip_offset;
-static unsigned int zip_dir_offset;
 static uint64_t zip_dir_entries;
 
 static unsigned int max_creator_version;
 
-#define ZIP_DIRECTORY_MIN_SIZE (1024 * 1024)
 #define ZIP_STREAM (1 <<  3)
 #define ZIP_UTF8   (1 << 11)
 
@@ -268,7 +266,6 @@ static int write_zip_entry(struct archiver_args *args,
unsigned long attr2;
unsigned long compressed_size;
unsigned long crc;
-   unsigned long direntsize;
int method;
unsigned char *out;
void *deflated = NULL;
@@ -356,13 +353,6 @@ static int write_zip_entry(struct archiver_args *args,
extra.flags[0] = 1; /* just mtime */
copy_le32(extra.mtime, args->time);
 
-   /* make sure we have enough free space in the dictionary */
-   direntsize = ZIP_DIR_HEADER_SIZE + pathlen + ZIP_EXTRA_MTIME_SIZE;
-   while (zip_dir_size < zip_dir_offset + direntsize) {
-   zip_dir_size += ZIP_DIRECTORY_MIN_SIZE;
-   zip_dir = xrealloc(zip_dir, zip_dir_size);
-   }
-
copy_le32(dirent.magic, 0x02014b50);
copy_le16(dirent.creator_version, creator_version);
copy_le16(dirent.version, 10);
@@ -486,12 +476,9 @@ static int write_zip_entry(struct archiver_args *args,
 
copy_le16(dirent.attr1, !is_binary);
 
-   memcpy(zip_dir + zip_dir_offset, , ZIP_DIR_HEADER_SIZE);
-   zip_dir_offset += ZIP_DIR_HEADER_SIZE;
-   memcpy(zip_dir + zip_dir_offset, path, pathlen);
-   zip_dir_offset += pathlen;
-   memcpy(zip_dir + zip_dir_offset, , ZIP_EXTRA_MTIME_SIZE);
-   zip_dir_offset += ZIP_EXTRA_MTIME_SIZE;
+   strbuf_add(_dir, , ZIP_DIR_HEADER_SIZE);
+   strbuf_add(_dir, path, pathlen);
+   strbuf_add(_dir, , ZIP_EXTRA_MTIME_SIZE);
zip_dir_entries++;
 
return 0;
@@ -510,12 +497,12 @@ static void write_zip64_trailer(void)
copy_le32(trailer64.directory_start_disk, 0);
copy_le64(trailer64.entries_on_this_disk, zip_dir_entries);
copy_le64(trailer64.entries, zip_dir_entries);
-   copy_le64(trailer64.size, zip_dir_offset);
+   copy_le64(trailer64.size, zip_dir.len);
copy_le64(trailer64.offset, zip_offset);
 
copy_le32(locator64.magic, 0x07064b50);
copy_le32(locator64.disk, 0);
-   copy_le64(locator64.offset, zip_offset + zip_dir_offset);
+   copy_le64(locator64.offset, zip_offset + zip_dir.len);
copy_le32(locator64.number_of_disks, 1);
 
write_or_die(1, , ZIP64_DIR_TRAILER_SIZE);
@@ -533,11 +520,11 @@ static void write_zip_trailer(const unsigned char *sha1)
copy_le16_clamp(trailer.entries_on_this_disk, zip_dir_entries,
);
copy_le16_clamp(trailer.entries, zip_dir_entries, );
-   copy_le32(trailer.size, zip_dir_offset);
+   copy_le32(trailer.size, zip_dir.len);
copy_le32(trailer.offset, zip_offset);
copy_le16(trailer.comment_length, sha1 ? GIT_SHA1_HEXSZ : 0);
 
-   write_or_die(1, zip_dir, zip_dir_offset);
+   write_or_die(1, zip_dir.buf, zip_dir.len);
if (clamped)
write_zip64_trailer();
write_or_die(1, , ZIP_DIR_TRAILER_SIZE);
@@ -568,14 +555,13 @@ static int write_zip_archive(const struct archiver *ar,
 
dos_time(>time, _date, _time);
 
-   zip_dir = xmalloc(ZIP_DIRECTORY_MIN_SIZE);
-   zip_dir_size = ZIP_DIRECTORY_MIN_SIZE;
+   strbuf_init(_dir, 0);
 
err = write_archive_entries(args, write_zip_entry);
if (!err)
write_zip_trailer(args->commit_sha1);
 
-   free(zip_dir);
+   strbuf_release(_dir);
 
return err;
 }
-- 
2.12.2



[PATCH v3 1/5] archive-zip: add tests for big ZIP archives

2017-04-24 Thread René Scharfe
Test the creation of ZIP archives bigger than 4GB and containing files
bigger than 4GB.  They are marked as EXPENSIVE because they take quite a
while and because the first one needs a bit more than 4GB of disk space
to store the resulting archive.

The big archive in the first test is made up of a tree containing
thousands of copies of a small file.  Yet the test has to write out the
full archive because unzip doesn't offer a way to read from stdin.

The big file in the second test is provided as a zipped pack file to
avoid writing another 4GB file to disk and then adding it.

Signed-off-by: Rene Scharfe 
---
 t/t5004-archive-corner-cases.sh |  45 
 t/t5004/big-pack.zip| Bin 0 -> 7373 bytes
 2 files changed, 45 insertions(+)
 create mode 100644 t/t5004/big-pack.zip

diff --git a/t/t5004-archive-corner-cases.sh b/t/t5004-archive-corner-cases.sh
index cca23383c5..bc052c803a 100755
--- a/t/t5004-archive-corner-cases.sh
+++ b/t/t5004-archive-corner-cases.sh
@@ -155,4 +155,49 @@ test_expect_success ZIPINFO 'zip archive with many 
entries' '
test_cmp expect actual
 '
 
+test_expect_failure EXPENSIVE,UNZIP 'zip archive bigger than 4GB' '
+   # build string containing 65536 characters
+   s=0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef &&
+   s=$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s &&
+   s=$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s$s &&
+
+   # create blob with a length of 65536 + 1 bytes
+   blob=$(echo $s | git hash-object -w --stdin) &&
+
+   # create tree containing 65500 entries of that blob
+   for i in $(test_seq 1 65500)
+   do
+   echo "100644 blob $blob $i"
+   done >tree &&
+   tree=$(git mktree big.lst &&
+   grep $size big.lst
+'
+
 test_done
diff --git a/t/t5004/big-pack.zip b/t/t5004/big-pack.zip
new file mode 100644
index 
..caaf614eeece6f818c525e433561e37560a75b05
GIT binary patch
literal 7373
zcmWIWW@Zs#U}E54xLY#A%SmVLl4u471|Jp%215oJhJwW8Y+ZvC^HlQ`3(KT5gS6zd
z#6$y=#3VyYlN6KGlw?bTG()50MB`LTb5p&{l#0+0P6p-9-y6#!_6|EcGQyxED
zyZY^&^YwM2r?Wic_gf3wz0#?R{8hB^$gt{iHiUEJN{iALjZ~|R(VzA
zOp{_@IDE*S!H8sEfc%Wl8*lHd?-DJPX@5BLD~n)>se}uQ{sHa{9CBhhi*hb3*-*jg
zc5o4&4%_XW4Bab?Enk9jiu`<7qbA8y|-Ck>2+hMBYN-pgCcFt>ev-5*|-`JE>Hh`B*~cT
z=?C44}E8GDaU*PD0ekK%{l7w@(f14RzJnfvea+fQlS75nRoU$s?(g=(fOLb%
zK_JPHAqeJ(fjJ$>oKcz4&>2l3kc=^!7e@2KXkHl23k(gTVK5p7z>;7z9gKznQp0()
zeK6WS7;PVn){Ud}!f4$%I)=h9I*!CJ8V10UU^E?!h5@KqG@1@Z!(cQWK$^#7<%OS{
z_HQoT!K!n;di%NDON?i(ygj-((dOTWgOj)4mXFhko42#<@9S6BTc6*5|Cc?$n~_P5
z8P`mn1SleafRRC^5k!+Qug40R*F&4rL$?-n>J8c2VFDo0!BTPW}9!Q@6
I&6hC%0B76lcmMzZ

literal 0
HcmV?d1

-- 
2.12.2



Re: [PATCH v7] read-cache: force_verify_index_checksum

2017-04-24 Thread Johannes Sixt

Am 14.04.2017 um 22:32 schrieb g...@jeffhostetler.com:

diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 33a51c9..677e15a 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -689,4 +689,17 @@ test_expect_success 'bogus head does not fallback to all 
heads' '
! grep $blob out
 '

+test_expect_success 'detect corrupt index file in fsck' '
+   cp .git/index .git/index.backup &&
+   test_when_finished "mv .git/index.backup .git/index" &&
+   echo  > &&
+   git add  &&
+   sed -e "s///" .git/index >.git/index.yyy &&


sed on a binary file? Sooner or later we are going to run into 
portability issues.



+   mv .git/index.yyy .git/index &&
+   # Confirm that fsck detects invalid checksum
+   test_must_fail git fsck --cache &&


You should ensure that this failure is really because of an invalid 
checksum. The failure could also be due to an extra LF at the end that 
sed inserted, no?



+   # Confirm that status no longer complains about invalid checksum
+   git status
+'
+
 test_done


-- Hannes



[PATCH v3 0/5] archive-zip: support files and archives bigger than 4GB

2017-04-24 Thread René Scharfe
The first patch adds (expensive) tests, the next two are cleanups which
set the stage for the remaining two to actually implement zip64 support
for offsets and file sizes.

Half of the series had been laying around for months, half-finished and
forgotten because I got distracted by the holiday season. :-/

  archive-zip: add tests for big ZIP archives
  archive-zip: use strbuf for ZIP directory
  archive-zip: write ZIP dir entry directly to strbuf
  archive-zip: support archives bigger than 4GB
  archive-zip: support files bigger than 4GB

 archive-zip.c   | 211 
 t/t5004-archive-corner-cases.sh |  45 +
 t/t5004/big-pack.zip| Bin 0 -> 7373 bytes
 3 files changed, 172 insertions(+), 84 deletions(-)
 create mode 100644 t/t5004/big-pack.zip

-- 
2.12.2



Re: [PATCH 11/53] fast-import: convert to struct object_id

2017-04-24 Thread Stefan Beller
On Sun, Apr 23, 2017 at 2:34 PM, brian m. carlson
 wrote:
> Convert the remaining parts of fast-import.c to use struct object_id.
> Convert several instances of get_sha1_hex to parse_oid_hex to avoid
> needing to specify constants.  Convert other hardcoded values to named
> constants.  Finally, use the is_empty_tree_oid function instead of a
> direct comparison against a fixed string.
>
> Note that the odd computation with GIT_MAX_HEXSZ is due to the insertion
> of a slash between every two hex digits in the path, plus one for the
> terminating NUL.

Up to and including this patch are
Reviewed-by: Stefan Beller 

I may continue reviewing this series once my eyes are refreshed.

>
> Signed-off-by: brian m. carlson 
> ---

> @@ -2823,12 +2821,10 @@ static void parse_new_commit(const char *arg)
> strbuf_addf(_data, "tree %s\n",
> oid_to_hex(>branch_tree.versions[1].oid));
> if (!is_null_oid(>oid))
> -   strbuf_addf(_data, "parent %s\n",
> -   oid_to_hex(>oid));
> +   strbuf_addf(_data, "parent %s\n", oid_to_hex(>oid));
> while (merge_list) {
> struct hash_list *next = merge_list->next;
> -   strbuf_addf(_data, "parent %s\n",
> -   oid_to_hex(_list->oid));
> +   strbuf_addf(_data, "parent %s\n", 
> oid_to_hex(_list->oid));
> free(merge_list);
> merge_list = next;
> }

This is a funny one. The only change is line rewrapping, as it fits
into 80 cols easily.
I was reviewing this series using colored --word-diff output, and this hunk does
not produce any red or green. I wonder if this is the intended
behavior of the word diffing
or if we rather want to insert a  - \n - .

Thanks,
Stefan


Re: t0025 flaky on OSX

2017-04-24 Thread Torsten Bögershausen
On 2017-04-24 18:45, Lars Schneider wrote:
> Hi,
> 
> "t0025.3 - crlf=true causes a CRLF file to be normalized" failed 
> sporadically on next and master recently: 
> https://travis-ci.org/git/git/jobs/225084459#L2382
> https://travis-ci.org/git/git/jobs/223830505#L2342
> 
> Are you aware of a race condition in the code
> or in the test?
Not yet - I'll have a look




Re: What's cooking in git.git (Apr 2017, #04; Wed, 19)

2017-04-24 Thread Philip Oakley

From: "Johannes Schindelin" 

Hi Hannes,

On Sat, 22 Apr 2017, Johannes Sixt wrote:


Am 21.04.2017 um 14:29 schrieb Christian Couder:
> First bisect should ask you to test merge bases only if there are
> "good" commits that are not ancestors of the "bad" commit.

That's a tangent, but I have never understood why this needs to be so.
Consider this:

   o--o--o--o--o--o--o--o--B
   /   /
 -o--o--o--o--g--o--o--o--o--G

When I mark B as bad and G as good, why would g have to be tested first? 
This
is exactly what I do when I bisect in Git history: I mark the latest 
commits
on git-gui and gitk sub-histories as good, because I know they can't 
possibly

be bad. (In my setup, these two histories are ahead of pu and next.)


I guess the idea behind bisect's reasoning is that you could have merged
the "wrong" branch first.

Ciao,
Dscho



Sorry if I'm bikeshedding here, but it does look like adding an alternate 
'bisect' strategy may be useful for this style of integration testing.


To me, given the multiplicity of independent branches being brought 
together, it should be possible to do a check on each of the branches 
separately, before the tests along the line of integration . The tests would 
not be a true 'bisection' as they are separate single point tests, but they 
do establish good commits at the tips of those branches.


Thus, for each of the merges in the --first-parent traversal, the option 
could test (in the OS of choice) the *second parent* commit of the merge. 
This sets the known good points. The breakages during the integration then 
become easier to bisect, as it is only looking for the integration of a good 
series into mainline that breaks. [1]


In addition, an aside that dscho made earlier about the merge-base of some 
branches relative to Windows may have been missed. The normal bisect process 
assumes that we start from a set of good merge bases. However, dscho noticed 
that occasionally some series may choose an older point on maint (etc.) that 
itself would _not_ be classed as good when tested on Windows (or on other 
OS's). Those older mergebases can make the classic bisect across all the 
commits in the DAG between here and there a tortuous process, especially if 
the local OS implementation percieves the base points as bad! (which breaks 
expectations).


--
Philip


[1] It maybe that this can be approached via an alternate DAG, which could 
be 'faked' up as if each of the topic branches had been squashed from being 
a long series down to a single commit (the final tree of the series), and 
then likewise (same tree) for the integration merges, such that all the 
singleton commits are not tested, only the pre and post merge commits on the 
first-parent traverse. 



Re: [PATCH v2] completion: optionally disable checkout DWIM

2017-04-24 Thread Brandon Williams
On 04/21, Jeff King wrote:
> 
> When we complete branch names for "git checkout", we also
> complete remote branch names that could trigger the DWIM
> behavior. Depending on your workflow and project, this can
> be either convenient or annoying.
> 
> For instance, my clone of gitster.git contains 74 local
> "jk/*" branches, but origin contains another 147. When I
> want to checkout a local branch but can't quite remember the
> name, tab completion shows me 251 entries. And worse, for a
> topic that has been picked up for pu, the upstream branch
> name is likely to be similar to mine, leading to a high
> probability that I pick the wrong one and accidentally
> create a new branch.
> 
> This patch adds a way for the user to tell the completion
> code not to include DWIM suggestions for checkout. This can
> already be done by typing:
> 
>   git checkout --no-guess jk/
> 
> but that's rather cumbersome. The downside, of course, is
> that you no longer get completion support when you _do_ want
> to invoke the DWIM behavior. But depending on your workflow,
> that may not be a big loss (for instance, in git.git I am
> much more likely to want to detach, so I'd type "git
> checkout origin/jk/" anyway).
> 
> Signed-off-by: Jeff King 

May not be relevant to this patch per-say, but is there a way to have
the completion for checkout only complete up to a part of
remotes/branches?  Say a forward-slash '/'.  For example git.git has
lots of branches which have the form: origin//branch-name.  It
would be convenient if when I type:

git checkout  

Instead of getting a long list of 2k or so branch names that instead I
would get see the remote's name e.g. 'origin/' kind of like vanilla
directory completion.  This way, unless I'm actually interested in the
remote, I don't see the thousands of branches related to it.

-- 
Brandon Williams


Re: [PATCH 00/53] object_id part 8

2017-04-24 Thread Brandon Williams
On 04/23, brian m. carlson wrote:
> This is the eighth series of patches to convert unsigned char [20] to
> struct object_id.  This series converts lookup_commit, lookup_blob,
> lookup_tree, lookup_tag, and finally parse_object to struct object_id.
> 
> A small number of functions have temporaries inserted during the
> conversion in order to allow conversion of functions that still need to
> take unsigned char *; they are removed either later in the series or
> will be in a future series.
> 
> This series can be fetched from the object-id-part8 branch from either
> of the follwing:
> 
> https://github.com/bk2204/git
> https://git.crustytoothpaste.net/git/bmc/git.git

I still have a series to convert the diff logic to using object_id.
I'll rebase it against this change and wait to send it out once this
topic has been reviewed and is stable (hopefully it won't require many
iterations).

-- 
Brandon Williams


Re: minor typos in documentation

2017-04-24 Thread Stefan Beller
On Sun, Apr 23, 2017 at 9:22 AM, René Genz  wrote:
> Hi Stefan,
>
> I submitted the patch to the mailing list.

Thanks for following through. :)

> Based on my experience I submitted another patch to improve the
> documentation.

Thanks for improving the documentation how to get started. :)

Cheers,
Stefan


Re: What's cooking in git.git (Apr 2017, #04; Wed, 19)

2017-04-24 Thread Ævar Arnfjörð Bjarmason
On Mon, Apr 24, 2017 at 4:19 PM, Johannes Schindelin
 wrote:
> Hi Junio,
>
> On Sun, 23 Apr 2017, Junio C Hamano wrote:
>
>> Johannes Schindelin  writes:
>>
>> > Part of the reason is that you push out all of the branches in one go,
>> > typically at the very end of your work day. The idea of Continuous
>> > Integration is a little orthogonal to that style, suggesting to build
>> > & test whenever new changes come into the integration branch.
>> >
>> > As a consequence, my original setup was a little overloaded: the VM
>> > sat idle most of the time, and when you pushed, it was overloaded.
>>
>> I do not see pushing out all them in one go is making the problem worse
>> for you, though.
>
> Oh no, you don't see that? Then let me spell it out a little more
> clearly: when you push out four branches at the same time, the same
> Virtual Machine that hosts all of the build agents has to build each and
> everyone of them, then run the entire test suite.
>
> As I have pointed out at several occasions (but I was probably complaining
> too much about it, so you probably ignored it), the test suite uses shell
> scripting a lot, and as a consequence it is really, really slow on
> Windows. Meaning that even on a high-end VM, it typically takes 1.5 hours
> to run the test suite. That's without SVN tests.
>
> So now we have up to four build agents banging at the same CPU and RAM,
> competing for resources. Now it takes more like 2-3 hours to run the
> entire build & test.
>
> The situation usually gets a little worse, even: you sometimes push out
> several iterations of `pu` in relatively rapid succession, "rapid" being
> relative to the time taken by the builds.
>
> That means that there are sometimes four jobs still hogging the VM when
> the next request to build & test `pu` arrives, and sometimes there is
> another one queued before the first job finishes.
>
> Naturally, the last two jobs will have started barely before Travis
> decides that it waited long enough (3 hours) to call it quits.
>
> To answer your implied question: the situation would be much, much better
> if the branches with more time in-between.
>
> But as I said, I understand that it would be asking you way too much to
> change your process that seems to work well for you.

Is getting the results of these builds time-critical? If not perhaps
an acceptable solution would be to use a source repo that's
time-delayed, e.g. 24hrs behind on average from Junio's git.git, and
where commits are pushed in at some configurable trickle.

>> As of this writing, master..pu counts 60+ first-parent merges.
>> Instead of pushing out the final one at the end of the day, I could
>> push out after every merge.  Behind the scenes, because some topics
>> are extended or tweaked while I read the list discussion, the number
>> of merges I am doing during a day is about twice or more than that
>> before I reach the final version for the day.
>>
>> Many issues can be noticed locally even before the patches hit a
>> topic, before the topic gets merged to 'pu', or before the tentative
>> 'pu' is pushed out, and breakage at each of these points can be
>> locally corrected without bothering external test setups.  I've been
>> assuming that pushing out all in one go at the end will help
>> reducing the load at external test setups.
>
> Pushing out only four updates at the end of the day is probably better
> than pushing after every merge, for sure.
>
> Ciao,
> Dscho


Re: What's cooking in git.git (Apr 2017, #04; Wed, 19)

2017-04-24 Thread Johannes Schindelin
Hi Hannes,

On Sat, 22 Apr 2017, Johannes Sixt wrote:

> Am 21.04.2017 um 14:29 schrieb Christian Couder:
> > First bisect should ask you to test merge bases only if there are
> > "good" commits that are not ancestors of the "bad" commit.
> 
> That's a tangent, but I have never understood why this needs to be so.
> Consider this:
> 
>o--o--o--o--o--o--o--o--B
>/   /
>  -o--o--o--o--g--o--o--o--o--G
> 
> When I mark B as bad and G as good, why would g have to be tested first? This
> is exactly what I do when I bisect in Git history: I mark the latest commits
> on git-gui and gitk sub-histories as good, because I know they can't possibly
> be bad. (In my setup, these two histories are ahead of pu and next.)

I guess the idea behind bisect's reasoning is that you could have merged
the "wrong" branch first.

Ciao,
Dscho


Re: What's cooking in git.git (Apr 2017, #04; Wed, 19)

2017-04-24 Thread Johannes Schindelin
Hi Junio,

On Sun, 23 Apr 2017, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> > Part of the reason is that you push out all of the branches in one go,
> > typically at the very end of your work day. The idea of Continuous
> > Integration is a little orthogonal to that style, suggesting to build
> > & test whenever new changes come into the integration branch.
> >
> > As a consequence, my original setup was a little overloaded: the VM
> > sat idle most of the time, and when you pushed, it was overloaded.
> 
> I do not see pushing out all them in one go is making the problem worse
> for you, though.

Oh no, you don't see that? Then let me spell it out a little more
clearly: when you push out four branches at the same time, the same
Virtual Machine that hosts all of the build agents has to build each and
everyone of them, then run the entire test suite.

As I have pointed out at several occasions (but I was probably complaining
too much about it, so you probably ignored it), the test suite uses shell
scripting a lot, and as a consequence it is really, really slow on
Windows. Meaning that even on a high-end VM, it typically takes 1.5 hours
to run the test suite. That's without SVN tests.

So now we have up to four build agents banging at the same CPU and RAM,
competing for resources. Now it takes more like 2-3 hours to run the
entire build & test.

The situation usually gets a little worse, even: you sometimes push out
several iterations of `pu` in relatively rapid succession, "rapid" being
relative to the time taken by the builds.

That means that there are sometimes four jobs still hogging the VM when
the next request to build & test `pu` arrives, and sometimes there is
another one queued before the first job finishes.

Naturally, the last two jobs will have started barely before Travis
decides that it waited long enough (3 hours) to call it quits.

To answer your implied question: the situation would be much, much better
if the branches with more time in-between.

But as I said, I understand that it would be asking you way too much to
change your process that seems to work well for you.

> As of this writing, master..pu counts 60+ first-parent merges.
> Instead of pushing out the final one at the end of the day, I could
> push out after every merge.  Behind the scenes, because some topics
> are extended or tweaked while I read the list discussion, the number
> of merges I am doing during a day is about twice or more than that
> before I reach the final version for the day.  
> 
> Many issues can be noticed locally even before the patches hit a
> topic, before the topic gets merged to 'pu', or before the tentative
> 'pu' is pushed out, and breakage at each of these points can be
> locally corrected without bothering external test setups.  I've been
> assuming that pushing out all in one go at the end will help
> reducing the load at external test setups.

Pushing out only four updates at the end of the day is probably better
than pushing after every merge, for sure.

Ciao,
Dscho


RE: [PATCH] worktree add: add --lock option

2017-04-24 Thread taylor, david

> From: Duy Nguyen [mailto:pclo...@gmail.com]
> Sent: Friday, April 14, 2017 9:01 AM
> To: Junio C Hamano
> Cc: Git Mailing List; taylor, david
> Subject: Re: [PATCH] worktree add: add --lock option
> 
> On Fri, Apr 14, 2017 at 5:50 AM, Junio C Hamano 
> wrote:
> > Nguyễn Thái Ngọc Duy   writes:
> >
> >> As explained in the document. This option has an advantage over the
> >> command sequence "git worktree add && git worktree lock": there will be
> >> no gap that somebody can accidentally "prune" the new worktree (or
> soon,
> >> explicitly "worktree remove" it).
> >>
> >> "worktree add" does keep a lock on while it's preparing the worktree.
> >> If --lock is specified, this lock remains after the worktree is created.
> >>
> >> Suggested-by: David Taylor 
> >> Signed-off-by: Nguyễn Thái Ngọc Duy 
> >> ---
> >>  A patch that adds --lock may look like this.
> >
> > This looks more like "I do believe the idea by David is a useful
> > addition and here is how I did it to the best of my ability---let's
> > make sure we polish it for eventual inclusion" than a mere "it may
> > look like so---do whatever you want with it" patch.
> 
> It is a good addition, which is why I added tests and documents, so it
> may have a chance for inclusion. I would not strongly defend it though
> if there's objection.
> 
> > To me "git worktree add --lock" somehow sounds less correct than
> > "git worktree add --locked", but I'd appreciate if natives can
> > correct me.
> 
> That was my first choice too. Then I saw --detach (instead of
> --detached). I didn't care much and went with a verb like the rest.

While I personally would prefer --locked, I also prefer keeping it 'parallel
construction' with --detach.  That is either [--detach][--lock] or
[--detached][--locked].  But, ultimately, my intended  use is within a script,
so even if it was --totally-non-mnemonic-option I would cope.

A stronger desire, regardless of whether it's Duy's implementation, mine, or
someone elses, is to have something acceptable to the community so that
we are not maintaining a fork with the attendant need to merge changes
in each time we upgrade.

> --
> Duy


Re: What's cooking in git.git (Apr 2017, #04; Wed, 19)

2017-04-24 Thread Johannes Schindelin
Hi Christian,

On Sat, 22 Apr 2017, Christian Couder wrote:

> On Sat, Apr 22, 2017 at 1:48 PM, Johannes Schindelin
>  wrote:
> >>
> >> First bisect should ask you to test merge bases only if there are
> >> "good" commits that are not ancestors of the "bad" commit.
> >
> > Please note that this is a stateless job. The only "state" I have is
> > the branch name.
> >
> > So when something goes wrong, I have *no* indicator what is a known
> > good state.
> 
> Maybe we could consider the last release a known good state?

You mean the latest release. I would expect that we won't have a last
release in a long time...

Using the latest release as a 'known good' would not improve on using the
strategy I chose, as the latest release is at most up-to-date with
maint/master. In the case of `pu`, for example, it is much better to test
`next` and use it as a known good state if it passes.

If we had a Pull Request centric workflow with one integration branch (as
opposed to four), it would be relatively easy, as `master` would always be
expected to serve as a "known good" state, and it would be the policy that
the build has to pass before Pull Requests would be accepted.

But we don't, and that's that.

Ciao,
Johannes


Re: [PATCH v4 0/9] Introduce timestamp_t for timestamps

2017-04-24 Thread Johannes Schindelin
Hi Jacob,

On Sun, 23 Apr 2017, Jacob Keller wrote:

> On Sun, Apr 23, 2017 at 8:29 PM, Junio C Hamano  wrote:
> > Johannes Schindelin  writes:
> >
> >> Changes since v3:
> >>
> >> - fixed the fix in archive-zip.c that tried to report a too large
> >>   timestamp (and would have reported the uninitialized time_t instead)
> >>
> >> - adjusted the so-far forgotten each_reflog() function (that was
> >>   introduced after v1, in 80f2a6097c4 (t/helper: add test-ref-store to
> >>   test ref-store functions, 2017-03-26)) to use timestamp_t and PRItime,
> >>   too
> >>
> >> - removed the date_overflows() check from time_to_tm(), as it calls
> >>   gm_time_t() which already performs that check
> >>
> >> - the date_overflows() check in show_ident_date() was removed, as we do
> >>   not know at that point yet whether we use the system functions to
> >>   render the date or not (and there would not be a problem in the latter
> >>   case)
> >
> > Assuming that the list consensus is to go with a separate
> > timestamp_t (for that added Cc for those whose comments I saw in an
> > earlier round), the patches looked mostly good (I didn't read with
> > fine toothed comb the largest one 6/8 to see if there were
> > inadvertent or missed conversions from ulong to timestamp_t,
> > though), modulo a few minor "huh?" comments I sent separately.
> >
> > Will queue; thanks.
> 
> I think that this timestamp_t makes sense. I didn't get a chance to
> review the code to make sure nothing was forgotten, but I think the
> direction makes sense to resolve the problems with current time_t and
> ulong assumptions.

TBH I rely a bit on the combination of compiling on Windows and on 32-bit
Linux to make sure that all the callers are converted. Originally, I was
more diligent, of course, but it took a while to get this patch series
into `pu` and there are so many possible new callers...

Ciao,
Dscho


Re: [PATCH v4 0/9] Introduce timestamp_t for timestamps

2017-04-24 Thread Johannes Schindelin
Hi Junio,

On Sun, 23 Apr 2017, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> > Changes since v3:
> >
> > - fixed the fix in archive-zip.c that tried to report a too large
> >   timestamp (and would have reported the uninitialized time_t instead)
> >
> > - adjusted the so-far forgotten each_reflog() function (that was
> >   introduced after v1, in 80f2a6097c4 (t/helper: add test-ref-store to
> >   test ref-store functions, 2017-03-26)) to use timestamp_t and PRItime,
> >   too
> >
> > - removed the date_overflows() check from time_to_tm(), as it calls
> >   gm_time_t() which already performs that check
> >
> > - the date_overflows() check in show_ident_date() was removed, as we do
> >   not know at that point yet whether we use the system functions to
> >   render the date or not (and there would not be a problem in the latter
> >   case)
> 
> Assuming that the list consensus is to go with a separate
> timestamp_t (for that added Cc for those whose comments I saw in an
> earlier round), the patches looked mostly good (I didn't read with
> fine toothed comb the largest one 6/8 to see if there were
> inadvertent or missed conversions from ulong to timestamp_t,
> though), modulo a few minor "huh?" comments I sent separately.

Dang, I forgot to Cc: Peff and René... And I sent out v5 before adding
them, sorry!

Ciao,
Dscho

[PATCH v5 8/8] Use uintmax_t for timestamps

2017-04-24 Thread Johannes Schindelin
Previously, we used `unsigned long` for timestamps. This was only a good
choice on Linux, where we know implicitly that `unsigned long` is what is
used for `time_t`.

However, we want to use a different data type for timestamps for two
reasons:

- there is nothing that says that `unsigned long` should be the same data
  type as `time_t`, and indeed, on 64-bit Windows for example, it is not:
  `unsigned long` is 32-bit but `time_t` is 64-bit.

- even on 32-bit Linux, where `unsigned long` (and thereby `time_t`) is
  32-bit, we *want* to be able to encode timestamps in Git that are
  currently absurdly far in the future, *even if* the system library is
  not able to format those timestamps into date strings.

So let's just switch to the maximal integer type available, which should
be at least 64-bit for all practical purposes these days. It certainly
cannot be worse than `unsigned long`, so...

Signed-off-by: Johannes Schindelin 
---
 git-compat-util.h | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/git-compat-util.h b/git-compat-util.h
index 594100e7652..b66995685af 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -319,10 +319,14 @@ extern char *gitdirname(char *);
 #define PRIo32 "o"
 #endif
 
-typedef unsigned long timestamp_t;
-#define PRItime "lu"
-#define parse_timestamp strtoul
+typedef uintmax_t timestamp_t;
+#define PRItime PRIuMAX
+#define parse_timestamp strtoumax
+#ifdef ULLONG_MAX
+#define TIME_MAX ULLONG_MAX
+#else
 #define TIME_MAX ULONG_MAX
+#endif
 
 #ifndef PATH_SEP
 #define PATH_SEP ':'
-- 
2.12.2.windows.2.406.gd14a8f8640f


[PATCH v5 7/8] Abort if the system time cannot handle one of our timestamps

2017-04-24 Thread Johannes Schindelin
We are about to switch to a new data type for time stamps that is
definitely not smaller or equal, but larger or equal to time_t.

So before using the system functions to process or format timestamps,
let's make extra certain that they can handle what we feed them.

Signed-off-by: Johannes Schindelin 
---
 date.c | 17 +++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/date.c b/date.c
index 92ab31aa441..63fa99685e2 100644
--- a/date.c
+++ b/date.c
@@ -46,7 +46,17 @@ static time_t gm_time_t(timestamp_t time, int tz)
minutes = tz < 0 ? -tz : tz;
minutes = (minutes / 100)*60 + (minutes % 100);
minutes = tz < 0 ? -minutes : minutes;
-   return time + minutes * 60;
+
+   if (minutes > 0) {
+   if (unsigned_add_overflows(time, minutes * 60))
+   die("Timestamp+tz too large: %"PRItime" +%04d",
+   time, tz);
+   } else if (time < -minutes * 60)
+   die("Timestamp before Unix epoch: %"PRItime" %04d", time, tz);
+   time += minutes * 60;
+   if (date_overflows(time))
+   die("Timestamp too large for this system: %"PRItime, time);
+   return (time_t)time;
 }
 
 /*
@@ -70,7 +80,10 @@ static int local_tzoffset(timestamp_t time)
struct tm tm;
int offset, eastwest;
 
-   t = time;
+   if (date_overflows(time))
+   die("Timestamp too large for this system: %"PRItime, time);
+
+   t = (time_t)time;
localtime_r(, );
t_local = tm_to_time_t();
 
-- 
2.12.2.windows.2.406.gd14a8f8640f




[PATCH v5 4/8] Specify explicitly where we parse timestamps

2017-04-24 Thread Johannes Schindelin
Currently, Git's source code represents all timestamps as `unsigned
long`. In preparation for using a more appropriate data type, let's
introduce a symbol `parse_timestamp` (currently being defined to
`strtoul`) where appropriate, so that we can later easily switch to,
say, use `strtoull()` instead.

Signed-off-by: Johannes Schindelin 
---
 builtin/am.c   | 2 +-
 builtin/receive-pack.c | 4 ++--
 bundle.c   | 2 +-
 commit.c   | 6 +++---
 date.c | 6 +++---
 fsck.c | 2 +-
 git-compat-util.h  | 2 ++
 pretty.c   | 2 +-
 ref-filter.c   | 2 +-
 refs/files-backend.c   | 2 +-
 t/helper/test-date.c   | 2 +-
 tag.c  | 4 ++--
 upload-pack.c  | 2 +-
 13 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index 805f56cec2f..ffb7a6355fb 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -886,7 +886,7 @@ static int hg_patch_to_mail(FILE *out, FILE *in, int 
keep_cr)
char *end;
 
errno = 0;
-   timestamp = strtoul(str, , 10);
+   timestamp = parse_timestamp(str, , 10);
if (errno)
return error(_("invalid timestamp"));
 
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index f96834f42c9..d25a57931f8 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -534,7 +534,7 @@ static const char *check_nonce(const char *buf, size_t len)
retval = NONCE_BAD;
goto leave;
}
-   stamp = strtoul(nonce, , 10);
+   stamp = parse_timestamp(nonce, , 10);
if (bohmac == nonce || bohmac[0] != '-') {
retval = NONCE_BAD;
goto leave;
@@ -552,7 +552,7 @@ static const char *check_nonce(const char *buf, size_t len)
 * would mean it was issued by another server with its clock
 * skewed in the future.
 */
-   ostamp = strtoul(push_cert_nonce, NULL, 10);
+   ostamp = parse_timestamp(push_cert_nonce, NULL, 10);
nonce_stamp_slop = (long)ostamp - (long)stamp;
 
if (nonce_stamp_slop_limit &&
diff --git a/bundle.c b/bundle.c
index bbf4efa0a0a..f43bfcf5ff3 100644
--- a/bundle.c
+++ b/bundle.c
@@ -227,7 +227,7 @@ static int is_tag_in_date_range(struct object *tag, struct 
rev_info *revs)
line = memchr(line, '>', lineend ? lineend - line : buf + size - line);
if (!line++)
goto out;
-   date = strtoul(line, NULL, 10);
+   date = parse_timestamp(line, NULL, 10);
result = (revs->max_age == -1 || revs->max_age < date) &&
(revs->min_age == -1 || revs->min_age > date);
 out:
diff --git a/commit.c b/commit.c
index 73c78c2b80c..0d2d0fa1984 100644
--- a/commit.c
+++ b/commit.c
@@ -89,8 +89,8 @@ static unsigned long parse_commit_date(const char *buf, const 
char *tail)
/* nada */;
if (buf >= tail)
return 0;
-   /* dateptr < buf && buf[-1] == '\n', so strtoul will stop at buf-1 */
-   return strtoul(dateptr, NULL, 10);
+   /* dateptr < buf && buf[-1] == '\n', so parsing will stop at buf-1 */
+   return parse_timestamp(dateptr, NULL, 10);
 }
 
 static struct commit_graft **commit_graft;
@@ -607,7 +607,7 @@ static void record_author_date(struct author_date_slab 
*author_date,
!ident.date_begin || !ident.date_end)
goto fail_exit; /* malformed "author" line */
 
-   date = strtoul(ident.date_begin, _end, 10);
+   date = parse_timestamp(ident.date_begin, _end, 10);
if (date_end != ident.date_end)
goto fail_exit; /* malformed date */
*(author_date_slab_at(author_date, commit)) = date;
diff --git a/date.c b/date.c
index a996331f5b3..495c207c64f 100644
--- a/date.c
+++ b/date.c
@@ -510,7 +510,7 @@ static int match_digit(const char *date, struct tm *tm, int 
*offset, int *tm_gmt
char *end;
unsigned long num;
 
-   num = strtoul(date, , 10);
+   num = parse_timestamp(date, , 10);
 
/*
 * Seconds since 1970? We trigger on that for any numbers with
@@ -658,7 +658,7 @@ static int match_object_header_date(const char *date, 
unsigned long *timestamp,
 
if (*date < '0' || '9' < *date)
return -1;
-   stamp = strtoul(date, , 10);
+   stamp = parse_timestamp(date, , 10);
if (*end != ' ' || stamp == ULONG_MAX || (end[1] != '+' && end[1] != 
'-'))
return -1;
date = end + 2;
@@ -1066,7 +1066,7 @@ static const char *approxidate_digit(const char *date, 
struct tm *tm, int *num,
 time_t now)
 {
char *end;
-   unsigned long number = strtoul(date, , 10);
+   unsigned long number = parse_timestamp(date, , 10);
 
switch (*end) {
case ':':
diff --git a/fsck.c b/fsck.c
index 

[PATCH v5 5/8] Introduce a new "printf format" for timestamps

2017-04-24 Thread Johannes Schindelin
Currently, Git's source code treats all timestamps as if they were
unsigned longs. Therefore, it is okay to write "%lu" when printing them.

There is a substantial problem with that, though: at least on Windows,
time_t is *larger* than unsigned long, and hence we will want to switch
away from the ill-specified `unsigned long` data type.

So let's introduce the pseudo format "PRItime" (currently simply being
defined to "lu") to make it easier to change the data type used for
timestamps.

Signed-off-by: Johannes Schindelin 
---
 builtin/blame.c   |  6 +++---
 builtin/fsck.c|  2 +-
 builtin/log.c |  2 +-
 builtin/receive-pack.c|  4 ++--
 builtin/rev-list.c|  2 +-
 builtin/rev-parse.c   |  2 +-
 date.c| 26 +-
 fetch-pack.c  |  2 +-
 git-compat-util.h |  1 +
 refs/files-backend.c  |  2 +-
 t/helper/test-date.c  |  2 +-
 t/helper/test-parse-options.c |  2 +-
 t/helper/test-ref-store.c |  2 +-
 upload-pack.c |  2 +-
 vcs-svn/fast_export.c |  4 ++--
 15 files changed, 31 insertions(+), 30 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 07506a3e457..e4b3c7b0ebf 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -1727,11 +1727,11 @@ static int emit_one_suspect_detail(struct origin 
*suspect, int repeat)
get_commit_info(suspect->commit, , 1);
printf("author %s\n", ci.author.buf);
printf("author-mail %s\n", ci.author_mail.buf);
-   printf("author-time %lu\n", ci.author_time);
+   printf("author-time %"PRItime"\n", ci.author_time);
printf("author-tz %s\n", ci.author_tz.buf);
printf("committer %s\n", ci.committer.buf);
printf("committer-mail %s\n", ci.committer_mail.buf);
-   printf("committer-time %lu\n", ci.committer_time);
+   printf("committer-time %"PRItime"\n", ci.committer_time);
printf("committer-tz %s\n", ci.committer_tz.buf);
printf("summary %s\n", ci.summary.buf);
if (suspect->commit->object.flags & UNINTERESTING)
@@ -1844,7 +1844,7 @@ static const char *format_time(unsigned long time, const 
char *tz_str,
 
strbuf_reset(_buf);
if (show_raw_time) {
-   strbuf_addf(_buf, "%lu %s", time, tz_str);
+   strbuf_addf(_buf, "%"PRItime" %s", time, tz_str);
}
else {
const char *time_str;
diff --git a/builtin/fsck.c b/builtin/fsck.c
index b5e13a45560..c233eda21c8 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -407,7 +407,7 @@ static void fsck_handle_reflog_oid(const char *refname, 
struct object_id *oid,
if (timestamp && name_objects)
add_decoration(fsck_walk_options.object_names,
obj,
-   xstrfmt("%s@{%ld}", refname, 
timestamp));
+   xstrfmt("%s@{%"PRItime"}", refname, 
timestamp));
obj->used = 1;
mark_object_reachable(obj);
} else {
diff --git a/builtin/log.c b/builtin/log.c
index b3b10cc1edb..f93ef6c7100 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -910,7 +910,7 @@ static void get_patch_ids(struct rev_info *rev, struct 
patch_ids *ids)
 static void gen_message_id(struct rev_info *info, char *base)
 {
struct strbuf buf = STRBUF_INIT;
-   strbuf_addf(, "%s.%lu.git.%s", base,
+   strbuf_addf(, "%s.%"PRItime".git.%s", base,
(unsigned long) time(NULL),

git_committer_info(IDENT_NO_NAME|IDENT_NO_DATE|IDENT_STRICT));
info->message_id = strbuf_detach(, NULL);
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index d25a57931f8..ab718f4402c 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -459,12 +459,12 @@ static char *prepare_push_cert_nonce(const char *path, 
unsigned long stamp)
struct strbuf buf = STRBUF_INIT;
unsigned char sha1[20];
 
-   strbuf_addf(, "%s:%lu", path, stamp);
+   strbuf_addf(, "%s:%"PRItime, path, stamp);
hmac_sha1(sha1, buf.buf, buf.len, cert_nonce_seed, 
strlen(cert_nonce_seed));;
strbuf_release();
 
/* RFC 2104 5. HMAC-SHA1-80 */
-   strbuf_addf(, "%lu-%.*s", stamp, 20, sha1_to_hex(sha1));
+   strbuf_addf(, "%"PRItime"-%.*s", stamp, 20, sha1_to_hex(sha1));
return strbuf_detach(, NULL);
 }
 
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index bcf77f0b8a2..3b292c99bda 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -80,7 +80,7 @@ static void show_commit(struct commit *commit, void *data)
}
 
if (info->show_timestamp)
-   printf("%lu ", commit->date);
+   printf("%"PRItime" ", commit->date);
if (info->header_prefix)

[PATCH v5 6/8] Introduce a new data type for timestamps

2017-04-24 Thread Johannes Schindelin
Git's source code assumes that unsigned long is at least as precise as
time_t. Which is incorrect, and causes a lot of problems, in particular
where unsigned long is only 32-bit (notably on Windows, even in 64-bit
versions).

So let's just use a more appropriate data type instead. In preparation
for this, we introduce the new `timestamp_t` data type.

By necessity, this is a very, very large patch, as it has to replace all
timestamps' data type in one go.

As we will use a data type that is not necessarily identical to `time_t`,
we need to be very careful to use `time_t` whenever we interact with the
system functions, and `timestamp_t` everywhere else.

Signed-off-by: Johannes Schindelin 
---
 Documentation/technical/api-parse-options.txt |  8 ++--
 archive-tar.c |  5 +-
 archive-zip.c | 12 -
 archive.h |  2 +-
 builtin/am.c  |  2 +-
 builtin/blame.c   |  8 ++--
 builtin/fsck.c|  4 +-
 builtin/gc.c  |  2 +-
 builtin/log.c |  2 +-
 builtin/merge-base.c  |  2 +-
 builtin/name-rev.c|  6 +--
 builtin/pack-objects.c|  4 +-
 builtin/prune.c   |  4 +-
 builtin/receive-pack.c|  6 +--
 builtin/reflog.c  | 24 +-
 builtin/show-branch.c |  4 +-
 builtin/worktree.c|  4 +-
 bundle.c  |  2 +-
 cache.h   | 14 +++---
 commit.c  | 12 ++---
 commit.h  |  2 +-
 config.c  |  2 +-
 credential-cache--daemon.c| 12 ++---
 date.c| 66 +--
 fetch-pack.c  |  6 +--
 git-compat-util.h |  2 +
 http-backend.c|  4 +-
 parse-options-cb.c|  4 +-
 pretty.c  |  2 +-
 reachable.c   |  9 ++--
 reachable.h   |  4 +-
 ref-filter.c  |  4 +-
 reflog-walk.c |  8 ++--
 refs.c| 14 +++---
 refs.h|  8 ++--
 refs/files-backend.c  |  4 +-
 revision.c|  6 +--
 revision.h|  4 +-
 sha1_name.c   |  6 +--
 t/helper/test-date.c  |  8 ++--
 t/helper/test-parse-options.c |  2 +-
 t/helper/test-ref-store.c |  2 +-
 tag.c |  2 +-
 tag.h |  2 +-
 upload-pack.c |  4 +-
 vcs-svn/fast_export.c |  4 +-
 vcs-svn/fast_export.h |  4 +-
 vcs-svn/svndump.c |  2 +-
 wt-status.c   |  2 +-
 49 files changed, 169 insertions(+), 157 deletions(-)

diff --git a/Documentation/technical/api-parse-options.txt 
b/Documentation/technical/api-parse-options.txt
index 36768b479e1..829b5581105 100644
--- a/Documentation/technical/api-parse-options.txt
+++ b/Documentation/technical/api-parse-options.txt
@@ -183,13 +183,13 @@ There are some macros to easily define options:
scale the provided value by 1024, 1024^2 or 1024^3 respectively.
The scaled value is put into `unsigned_long_var`.
 
-`OPT_DATE(short, long, _var, description)`::
+`OPT_DATE(short, long, _t_var, description)`::
Introduce an option with date argument, see `approxidate()`.
-   The timestamp is put into `int_var`.
+   The timestamp is put into `timestamp_t_var`.
 
-`OPT_EXPIRY_DATE(short, long, _var, description)`::
+`OPT_EXPIRY_DATE(short, long, _t_var, description)`::
Introduce an option with expiry date argument, see 
`parse_expiry_date()`.
-   The timestamp is put into `int_var`.
+   The timestamp is put into `timestamp_t_var`.
 
 `OPT_CALLBACK(short, long, , arg_str, description, func_ptr)`::
Introduce an option with argument.
diff --git a/archive-tar.c b/archive-tar.c
index 380e3aedd23..695339a2369 100644
--- a/archive-tar.c
+++ b/archive-tar.c
@@ -27,9 +27,12 @@ static int write_tar_filter_archive(const struct archiver 
*ar,
  */
 #if ULONG_MAX == 0x
 #define USTAR_MAX_SIZE ULONG_MAX
-#define USTAR_MAX_MTIME ULONG_MAX
 #else
 #define 

  1   2   >