Re: [ANNOUNCE] Git v2.12.0

2017-02-24 Thread Willy Tarreau
Hi Junio,

On Fri, Feb 24, 2017 at 11:28:58AM -0800, Junio C Hamano wrote:
>  * Use of an empty string that is used for 'everything matches' is
>still warned and Git asks users to use a more explicit '.' for that
>instead.  The hope is that existing users will not mind this
>change, and eventually the warning can be turned into a hard error,
>upgrading the deprecation into removal of this (mis)feature.  That
>is not scheduled to happen in the upcoming release (yet).

FWIW '.' is not equivalent to '' when it comes to grep or such commands,
you should suggest '^' or '$' instead, otherwise you'll miss empty lines
and users will continue to use '' for that purpose. BTW I do use grep ''
a lot, but only on file systems, not within git (eg: to display contents
of /sys).

Cheers,
Willy


[PATCH 1/6 v5] revision.c: do not update argv with unknown option

2017-02-24 Thread Siddharth Kannan
handle_revision_opt() tries to recognize and handle the given argument. If an
option was unknown to it, it used to add the option to unkv[(*unkc)++].  This
increment of unkc causes the variable in the caller to change.

Teach handle_revision_opt to not update unknown arguments inside unkc anymore.
This is now the responsibility of the caller.

There are two callers of this function:

1. setup_revision: Changes have been made so that setup_revision will now
update the unknown option in argv

2. parse_revision_opt: No changes are required here. This function throws an
error whenever the option provided as argument was unknown to
handle_revision_opt().

Signed-off-by: Siddharth Kannan 
---
 revision.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/revision.c b/revision.c
index b37dbec..5674a9a 100644
--- a/revision.c
+++ b/revision.c
@@ -2016,8 +2016,6 @@ static int handle_revision_opt(struct rev_info *revs, int 
argc, const char **arg
revs->ignore_missing = 1;
} else {
int opts = diff_opt_parse(>diffopt, argv, argc, 
revs->prefix);
-   if (!opts)
-   unkv[(*unkc)++] = arg;
return opts;
}
if (revs->graph && revs->track_linear)
@@ -2234,6 +2232,8 @@ int setup_revisions(int argc, const char **argv, struct 
rev_info *revs, struct s
}
if (opts < 0)
exit(128);
+   /* arg is an unknown option */
+   argv[left++] = arg;
continue;
}
 
-- 
2.1.4



[PATCH 1/6 v5] revision.c: do not update argv with unknown option

2017-02-24 Thread Siddharth Kannan
handle_revision_opt() tries to recognize and handle the given argument. If an
option was unknown to it, it used to add the option to unkv[(*unkc)++].  This
increment of unkc causes the variable in the caller to change.

Teach handle_revision_opt to not update unknown arguments inside unkc anymore.
This is now the responsibility of the caller.

There are two callers of this function:

1. setup_revision: Changes have been made so that setup_revision will now
update the unknown option in argv

2. parse_revision_opt: No changes are required here. This function throws an
error whenever the option provided as argument was unknown to
handle_revision_opt().

Signed-off-by: Siddharth Kannan 
---
 revision.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/revision.c b/revision.c
index b37dbec..5674a9a 100644
--- a/revision.c
+++ b/revision.c
@@ -2016,8 +2016,6 @@ static int handle_revision_opt(struct rev_info *revs, int 
argc, const char **arg
revs->ignore_missing = 1;
} else {
int opts = diff_opt_parse(>diffopt, argv, argc, 
revs->prefix);
-   if (!opts)
-   unkv[(*unkc)++] = arg;
return opts;
}
if (revs->graph && revs->track_linear)
@@ -2234,6 +2232,8 @@ int setup_revisions(int argc, const char **argv, struct 
rev_info *revs, struct s
}
if (opts < 0)
exit(128);
+   /* arg is an unknown option */
+   argv[left++] = arg;
continue;
}
 
-- 
2.1.4



[PATCH 2/6 v5] revision.c: swap if/else blocks

2017-02-24 Thread Siddharth Kannan
Swap the condition and bodies of an "if (A) do_A else do_B" in
setup_revisions() to "if (!A) do_B else do A", to make the change in
the the next step easier to read.

No behaviour change is intended in this step.

Signed-off-by: Siddharth Kannan 
---
 revision.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/revision.c b/revision.c
index 5674a9a..8d4ddae 100644
--- a/revision.c
+++ b/revision.c
@@ -2238,7 +2238,9 @@ int setup_revisions(int argc, const char **argv, struct 
rev_info *revs, struct s
}
 
 
-   if (handle_revision_arg(arg, revs, flags, revarg_opt)) {
+   if (!handle_revision_arg(arg, revs, flags, revarg_opt))
+   got_rev_arg = 1;
+   else {
int j;
if (seen_dashdash || *arg == '^')
die("bad revision '%s'", arg);
@@ -2255,8 +2257,6 @@ int setup_revisions(int argc, const char **argv, struct 
rev_info *revs, struct s
append_prune_data(_data, argv + i);
break;
}
-   else
-   got_rev_arg = 1;
}
 
if (prune_data.nr) {
-- 
2.1.4



[PATCH 6/6 v5] revert.c: delegate handling of "-" shorthand to setup_revisions

2017-02-24 Thread Siddharth Kannan
revert.c:run_sequencer calls setup_revisions right after replacing "-" with
"@{-1}" for this shorthand. A previous patch taught setup_revisions to handle
this shorthand by doing the required replacement inside revision.c:get_sha1_1.

Hence, the code here is redundant and has been removed.

This patch also adds a test to check that revert recognizes the "-" shorthand.

Signed-off-by: Siddharth Kannan 
---
 builtin/revert.c|  2 --
 t/t3514-revert-shorthand.sh | 25 +
 2 files changed, 25 insertions(+), 2 deletions(-)
 create mode 100755 t/t3514-revert-shorthand.sh

diff --git a/builtin/revert.c b/builtin/revert.c
index 4ca5b51..0bc6657 100644
--- a/builtin/revert.c
+++ b/builtin/revert.c
@@ -155,8 +155,6 @@ static int run_sequencer(int argc, const char **argv, 
struct replay_opts *opts)
opts->revs->no_walk = REVISION_WALK_NO_WALK_UNSORTED;
if (argc < 2)
usage_with_options(usage_str, options);
-   if (!strcmp(argv[1], "-"))
-   argv[1] = "@{-1}";
memset(_r_opt, 0, sizeof(s_r_opt));
s_r_opt.assume_dashdash = 1;
argc = setup_revisions(argc, argv, opts->revs, _r_opt);
diff --git a/t/t3514-revert-shorthand.sh b/t/t3514-revert-shorthand.sh
new file mode 100755
index 000..51f8c81d
--- /dev/null
+++ b/t/t3514-revert-shorthand.sh
@@ -0,0 +1,25 @@
+#!/bin/sh
+
+test_description='log can show previous branch using shorthand - for @{-1}'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+   test_commit first
+'
+
+test_expect_success 'setup branches' '
+echo "hello" >hello &&
+cat hello >expect &&
+git add hello &&
+git commit -m "hello first commit" &&
+echo "world" >>hello &&
+git commit -am "hello second commit" &&
+git checkout -b testing-1 &&
+git checkout master &&
+git revert --no-edit - &&
+cat hello >actual &&
+test_cmp expect actual
+'
+
+test_done
-- 
2.1.4



[PATCH 5/6 v5] merge.c: delegate handling of "-" shorthand to revision.c:get_sha1

2017-02-24 Thread Siddharth Kannan
The callchain for handling each argument contains the function
revision.c:get_sha1 where the shorthand for "-" ~ "@{-1}" has already been
implemented in a previous patch; the complete callchain leading to that
function is:

1. merge.c:collect_parents
2. commit.c:get_merge_parent : this function calls revision.c:get_sha1

This patch also adds a test for checking that the shorthand works properly

Signed-off-by: Siddharth Kannan 
---
 builtin/merge.c   |  2 --
 t/t3035-merge-hyphen-shorthand.sh | 33 +
 2 files changed, 33 insertions(+), 2 deletions(-)
 create mode 100755 t/t3035-merge-hyphen-shorthand.sh

diff --git a/builtin/merge.c b/builtin/merge.c
index a96d4fb..36ff420 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -1228,8 +1228,6 @@ int cmd_merge(int argc, const char **argv, const char 
*prefix)
argc = setup_with_upstream();
else
die(_("No commit specified and merge.defaultToUpstream 
not set."));
-   } else if (argc == 1 && !strcmp(argv[0], "-")) {
-   argv[0] = "@{-1}";
}
 
if (!argc)
diff --git a/t/t3035-merge-hyphen-shorthand.sh 
b/t/t3035-merge-hyphen-shorthand.sh
new file mode 100755
index 000..fd37ff9
--- /dev/null
+++ b/t/t3035-merge-hyphen-shorthand.sh
@@ -0,0 +1,33 @@
+#!/bin/sh
+
+test_description='merge uses the shorthand - for @{-1}'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+   test_commit first &&
+   test_commit second &&
+   test_commit third &&
+   test_commit fourth &&
+   test_commit fifth &&
+   test_commit sixth &&
+   test_commit seventh
+'
+
+test_expect_success 'setup branches' '
+git checkout master &&
+git checkout -b testing-2 &&
+git checkout -b testing-1 &&
+test_commit eigth &&
+test_commit ninth
+'
+
+test_expect_success 'merge - should work' '
+git checkout testing-2 &&
+git merge - &&
+git rev-parse HEAD HEAD^^ | sort >actual &&
+git rev-parse master testing-1 | sort >expect &&
+test_cmp expect actual
+'
+
+test_done
-- 
2.1.4



[PATCH 4/6 v5] sha1_name.c: teach get_sha1_1 "-" shorthand for "@{-1}"

2017-02-24 Thread Siddharth Kannan
This patch introduces "-" as a method to refer to a revision, and adds tests to
test that git-log works with this shorthand.

This change will touch the following commands (through
revision.c:setup_revisions):

* builtin/blame.c
* builtin/diff.c
* builtin/diff-files.c
* builtin/diff-index.c
* builtin/diff-tree.c
* builtin/log.c
* builtin/rev-list.c
* builtin/shortlog.c
* builtin/fast-export.c
* builtin/fmt-merge-msg.c
builtin/add.c
builtin/checkout.c
builtin/commit.c
builtin/merge.c
builtin/pack-objects.c
builtin/revert.c

* marked commands are information-only.

As most commands in this list are not of the rm-variety, (i.e a command that
would delete something), this change does not make it easier for people to
delete. (eg: "git branch -d -" is *not* enabled by this patch)

Suffixes like "-@{yesterday}" and "-@{2.days.ago}" are not enabled by this
patch. This is something that needs to be fixed later by making changes deeper
down the callchain.

Signed-off-by: Siddharth Kannan 
---
 sha1_name.c  |   5 +++
 t/t4214-log-shorthand.sh | 106 +++
 2 files changed, 111 insertions(+)
 create mode 100755 t/t4214-log-shorthand.sh

diff --git a/sha1_name.c b/sha1_name.c
index 73a915f..2f86bc9 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -947,6 +947,11 @@ static int get_sha1_1(const char *name, int len, unsigned 
char *sha1, unsigned l
if (!ret)
return 0;
 
+   if (*name == '-' && len == 1) {
+   name = "@{-1}";
+   len = 5;
+   }
+
ret = get_sha1_basic(name, len, sha1, lookup_flags);
if (!ret)
return 0;
diff --git a/t/t4214-log-shorthand.sh b/t/t4214-log-shorthand.sh
new file mode 100755
index 000..8be2de1
--- /dev/null
+++ b/t/t4214-log-shorthand.sh
@@ -0,0 +1,106 @@
+#!/bin/sh
+
+test_description='log can show previous branch using shorthand - for @{-1}'
+
+. ./test-lib.sh
+
+test_expect_success 'setup' '
+   test_commit first &&
+   test_commit second &&
+   test_commit third &&
+   test_commit fourth &&
+   test_commit fifth &&
+   test_commit sixth &&
+   test_commit seventh
+'
+
+test_expect_success '"log -" should not work initially' '
+   test_must_fail git log -
+'
+
+test_expect_success 'setup branches for testing' '
+   git checkout -b testing-1 master^ &&
+   git checkout -b testing-2 master~2 &&
+   git checkout master
+'
+
+test_expect_success '"log -" should work' '
+   git log testing-2 >expect &&
+   git log - >actual &&
+   test_cmp expect actual
+'
+
+test_expect_success 'symmetric revision range should work when one end is left 
empty' '
+   git checkout testing-2 &&
+   git checkout master &&
+   git log ...@{-1} >expect.first_empty &&
+   git log @{-1}... >expect.last_empty &&
+   git log ...- >actual.first_empty &&
+   git log -... >actual.last_empty &&
+   test_cmp expect.first_empty actual.first_empty &&
+   test_cmp expect.last_empty actual.last_empty
+'
+
+test_expect_success 'asymmetric revision range should work when one end is 
left empty' '
+   git checkout testing-2 &&
+   git checkout master &&
+   git log ..@{-1} >expect.first_empty &&
+   git log @{-1}.. >expect.last_empty &&
+   git log ..- >actual.first_empty &&
+   git log -.. >actual.last_empty &&
+   test_cmp expect.first_empty actual.first_empty &&
+   test_cmp expect.last_empty actual.last_empty
+'
+
+test_expect_success 'symmetric revision range should work when both ends are 
given' '
+   git checkout testing-2 &&
+   git checkout master &&
+   git log -...testing-1 >expect &&
+   git log testing-2...testing-1 >actual &&
+   test_cmp expect actual
+'
+
+test_expect_success 'asymmetric revision range should work when both ends are 
given' '
+   git checkout testing-2 &&
+   git checkout master &&
+   git log -..testing-1 >expect &&
+   git log testing-2..testing-1 >actual &&
+   test_cmp expect actual
+'
+
+test_expect_success 'multiple separate arguments should be handled properly' '
+   git checkout testing-2 &&
+   git checkout master &&
+   git log - - >expect.1 &&
+   git log @{-1} @{-1} >actual.1 &&
+   git log - HEAD >expect.2 &&
+   git log @{-1} HEAD >actual.2 &&
+   test_cmp expect.1 actual.1 &&
+   test_cmp expect.2 actual.2
+'
+
+test_expect_success 'revision ranges with same start and end should be empty' '
+   git checkout testing-2 &&
+   git checkout master &&
+   test 0 -eq $(git log -...- | wc -l) &&
+   test 0 -eq $(git log -..- | wc -l)
+'
+
+test_expect_success 'suffixes to - should work' '
+   git checkout testing-2 &&
+   git checkout master &&
+   git log -~ >expect.1 &&
+   git log @{-1}~ >actual.1 &&
+   git log -~2 >expect.2 &&
+   git log @{-1}~2 >actual.2 &&
+   git log -^ >expect.3 &&
+ 

[PATCH 3/6 v5] revision.c: args starting with "-" might be a revision

2017-02-24 Thread Siddharth Kannan
setup_revisions used to consider any argument starting with "-" to be either a
valid or an unknown option.

Teach setup_revisions to check if an argument is a revision before adding it as
an unknown option (something that setup_revisions didn't understand) to argv,
and moving on to the next argument.

This patch prepares the addition of "-" as a shorthand for "previous branch".

Signed-off-by: Siddharth Kannan 
---
 revision.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/revision.c b/revision.c
index 8d4ddae..5470c33 100644
--- a/revision.c
+++ b/revision.c
@@ -2203,6 +2203,7 @@ int setup_revisions(int argc, const char **argv, struct 
rev_info *revs, struct s
read_from_stdin = 0;
for (left = i = 1; i < argc; i++) {
const char *arg = argv[i];
+   int maybe_opt = 0;
if (*arg == '-') {
int opts;
 
@@ -2232,15 +2233,17 @@ int setup_revisions(int argc, const char **argv, struct 
rev_info *revs, struct s
}
if (opts < 0)
exit(128);
-   /* arg is an unknown option */
-   argv[left++] = arg;
-   continue;
+   maybe_opt = 1;
}
 
 
if (!handle_revision_arg(arg, revs, flags, revarg_opt))
got_rev_arg = 1;
-   else {
+   else if (maybe_opt) {
+   /* arg is an unknown option */
+   argv[left++] = arg;
+   continue;
+   } else {
int j;
if (seen_dashdash || *arg == '^')
die("bad revision '%s'", arg);
-- 
2.1.4



[PATCH 0/6 v5] allow "-" as a shorthand for "previous branch"

2017-02-24 Thread Siddharth Kannan
An updated version of the patch [1]. Discussion here[1] has been taken into
account. The test for "-@{yesterday}" is there inside the log-shorthand test,
it is commented out for now.

I have removed the redundant pieces of code in merge.c and revert.c as mentioned
by Matthieu in [2]. As analysed by Junio[3], the same type of code inside
checkout.c and worktree.c can not be removed because the appropriate functions
inside revision.c are not called in their codepaths.

Thanks for your review of the previous versions, Junio and Matthieu!

[1]: 1487258054-32292-1-git-send-email-kannan.siddhart...@gmail.com
[2]: vpqbmu768on@anie.imag.fr
[3]: xmqq1sv1euob@gitster.mtv.corp.google.com

Siddharth Kannan (6):
  revision.c: do not update argv with unknown option
  revision.c: swap if/else blocks
  revision.c: args starting with "-" might be a revision
  sha1_name.c: teach get_sha1_1 "-" shorthand for "@{-1}"
  merge.c: delegate handling of "-" shorthand to revision.c:get_sha1
  revert.c: delegate handling of "-" shorthand to setup_revisions

 builtin/merge.c   |   2 -
 builtin/revert.c  |   2 -
 revision.c|  15 +++---
 sha1_name.c   |   5 ++
 t/t3035-merge-hyphen-shorthand.sh |  33 
 t/t3514-revert-shorthand.sh   |  25 +
 t/t4214-log-shorthand.sh  | 106 ++
 7 files changed, 178 insertions(+), 10 deletions(-)
 create mode 100755 t/t3035-merge-hyphen-shorthand.sh
 create mode 100755 t/t3514-revert-shorthand.sh
 create mode 100755 t/t4214-log-shorthand.sh

-- 
2.1.4



Re: SHA1 collisions found

2017-02-24 Thread Junio C Hamano
Linus Torvalds  writes:

> For example, what I would suggest the rules be is something like this:
>
>  - introduce new tag2/commit2/tree2/blob2 object type tags that imply
> that they were hashed using the new hash
>
>  - an old type obviously can never contain a pointer to a new type (ie
> you can't have a "tree" object that contains a tree2 object or a blob2
> object.
>
>  - but also make the rule that a *new* type can never contain a
> pointer to an old type, with the *very* specific exception that a
> commit2 can have a parent that is of type "commit".

OK, I think that is what Peff was suggesting in his message, and I
do not have problem with such a transition plan.  Or the *very*
specific exception could be that a reference to "commit" can use old
name (which would allow binding a submodule before transition to a
new project).

We probably do not need "blob2" object as they do not embed any
pointer to another thing.  A loose blob with old name can be made
available on the filesystem also under new name without much "heavy"
transition, and an in-pack blob can be pointed at with _two_ entries
in the updated pack index file under old and new names, both for the
base (just deflated) representation and also ofs-delta.  A ref-delta
based on another blob with old name may need a bit of special
handling, but the deltification would not be visible at the "struct object"
layer, so probably not such a big deal.

We may also be able to get away without "commit2" and "tag2" as
their pointers can be widened and parse_{commit,tag}_object() should
be able to deal with objects with new names transparently.  "tree2"
may be a bit tricky, though, but offhand it seems to me that nothing
is insurmountable.

> That way everything "converges" towards the new format: the only way
> you can stay on the old format is if you only have old-format objects,
> and once you have a new-format object all your objects are going to be
> new format - except for the history.

Yes.

> So you will end up with duplicate objects, and that's not good (think
> of what it does to all our full-tree "diff" optimizations, for example
> - you no longer get the "these sub-trees are identical" across a
> format change), but realistically you'll have a very limited time of
> that kind of duplication.
>
> I'd furthermore suggest that from a UI standpoint, we'd
>
>  - convert to 64-character hex numbers (32-byte hashes)
>
>  - (as mentioned earlier) default to a 40-character abbreviation
>
>  - make the old 40-character SHA1's just show up within the same
> address space (so they'd also be encoded as 32-byte hashes, just with
> the last 12 bytes zero).

Yes to all of the above.

>  - you'd see in the "object->type" whether it's a new or old-style hash.

I am not sure if this is needed.  We may need to abstract tree_entry walker
a little bit as a preparatory step, but I suspect that the hash (and
more importantly the internal format) can be kept as an internal
knowledge to the object layer (i.e. {commit,tree,tag}.c).

So,... thanks for straightening me out.  I was thinking we would
need mixed mode support for smoother transition, but it now seems to
me that the approach to stratify the history into old and new is
workable.


Re: SHA1 collisions found

2017-02-24 Thread grarpamp
Repos should address keeping / 'fixing' broken sha-1 as needed.
They also really need to create new native modes so users can
initialize and use repos with (sha-3 / sha-256 / whatever) going forward.
Backward compatibility with sha-1 or 'fixed sha-1' will be fine. Clients
can 'taste' and 'test' repos for which hash mode to use, or add it to
their configs. Make things flexible, modular, configurable, updateable.
What little point is there in 'fixing / caveating' their use of broken sha-1,
without also doing strong (sha-3 / optionals) in the first place, defaulting
new init's to whichever strong hash looks good, and letting natural
migration to that happen on its own through the default process.
Introducing new hash modes also gives good oppurtunity to incorporate
other generally 'incompatabile with the old' changes to benefit the future.
One might argue against mixed mode, after all, export and import,
as with any other repo migration, is generally possible.  And mixed
mode tends to prolong the actual endeavour to move to something
better in the init itself. Native and new makes you update to follow.
A lot of question / wrong ramble here, but the point should be
consistant... move, natively, even if only for sake of death of old
broken hashes. And attacks only get worse. Thought food is all.


Re: [PATCH v2 1/4] delete_ref: accept a reflog message argument

2017-02-24 Thread Kyle Meyer
Duy Nguyen  writes:

> You'll probably want to update the comment block above if msg can be
> NULL. We have _very_ good documentation in this file, let's keep it
> uptodate.

Looking at how other functions in refs.h document their "msg" or
"logmsg" parameter, none seem to mention it explicitly.  update_ref
refers to ref_transaction_update, and
ref_transaction_{update,create,delete,verify} refer to ref.h's
"Reference transaction updates" comment.

delete_ref's docstring already mentions that "flag" is passed to
ref_transaction_delete, so perhaps it should mention "msg" here as
well.

-- >8 --
Subject: [PATCH] delete_ref: mention "msg" parameter in docstring

delete_ref() was recently extended to take a "msg" argument.
---
 refs.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/refs.h b/refs.h
index e529f4c3a..988750218 100644
--- a/refs.h
+++ b/refs.h
@@ -274,7 +274,8 @@ int reflog_exists(const char *refname);
  * verify that the current value of the reference is old_sha1 before
  * deleting it. If old_sha1 is NULL, delete the reference if it
  * exists, regardless of its old value. It is an error for old_sha1 to
- * be NULL_SHA1. flags is passed through to ref_transaction_delete().
+ * be NULL_SHA1. msg and flags are passed through to
+ * ref_transaction_delete().
  */
 int delete_ref(const char *msg, const char *refname,
   const unsigned char *old_sha1, unsigned int flags);
-- 
2.11.1



Hello

2017-02-24 Thread university
Hello


Re: SHA1 collisions found

2017-02-24 Thread Jacob Keller
On Fri, Feb 24, 2017 at 5:39 PM, David Lang  wrote:
> On Fri, 24 Feb 2017, Jeff King wrote:
>
>>> what if they are forks of each other? (LEDE and OpenWRT, or just
>>> linux-kernel and linux-kernel-stable)
>>
>>
>> Once one flips, the other one needs to flip to, or can't interact with
>> them. I know that's harsh, and is likely to create headaches. But in the
>> long run, I think once everything has converged the resulting system is
>> less insane.
>>
>> For that reason I _wouldn't_ recommend projects like the kernel flip the
>> flag immediately. Ideally we write the code and the new versions
>> permeate the community. Then somebody (per-project) decides that it's
>> time for the community to start switching.
>
>
> can you 'un-flip' the flag? or if you have someone who is a developer flip
> their repo (because they heard that sha1 is unsafe, and they want to be
> safe), they can't contribute to the kernel. We don't want to have them loose
> all their work, so how can they convert their local repo back to somthing
> that's compatible?

I'd think one of the first things we want is a way to flip *and*
unflip by re-writing history ala git-filter-branch style. (So if you
wanted, you could also flip all your old history).

One unrelated thought I had. When an old client sees the new stuff, it
will probably fail in a lot of weird ways. I wonder what we can do so
that if we in the future have to switch to an even newer hash, how can
we make it so that the old versions give a more clean error
experience? Ideally so that it lessens the pain of transition somewhat
in the future if/when it has to happen again?

Thanks,
Jake


Re: SHA1 collisions found

2017-02-24 Thread Jacob Keller
On Fri, Feb 24, 2017 at 5:21 PM, Jeff King  wrote:
> On Fri, Feb 24, 2017 at 05:00:55PM -0800, David Lang wrote:
>
>> On Fri, 24 Feb 2017, Jeff King wrote:
>>
>> >
>> > So I'd much rather see strong rules like:
>> >
>> >  1. Once a repo has flag-day switched over to the new hash format[1],
>> > new references are _always_ done with the new hash. Even ones that
>> > point to pre-flag-day objects!
>>
>> how do you define when a repo has "switched over" to the new format in a
>> distributed environment?
>
> You don't. It's a decision for each local repo, but the rules push
> everybody towards upgrading (because you forbid them pulling from or
> pushing to people who have upgraded).
>
> So in practice, some centralized distribution point switches, and then
> it floods out from there.

This seems like the most reasonable strategy so far. I think that
trying to allow long term co-existence is a huge pain that discourages
switching, when we actually want to encourage everyone to switch
someone has switched.

I don't think it's sane to try and allow simultaneous use of both
hashes, since that creates a lot of headaches and discourages
transition somewhat.

Thanks,
Jake


Re: SHA1 collisions found

2017-02-24 Thread David Lang

On Fri, 24 Feb 2017, Jeff King wrote:


OpenWRT/LEDE have their core repo, and they pull from many other (unrelated)
projects into that repo (and then have 'feeds', which is
sort-of-like-submodules to pull in other software that's maintained
completely independently)


I think with submodules this should probably still work.  If they are
pulling in with a subtree-ish strategy, then they'd convert the incoming
trees to the newhash format as part of that.


as I understand things, they have two categories of things

1. Feeds, which are completely independent, separate maintainers

2. core, which gets pulled into one repo, I don't know if they use submodules in 
the process. I know that what downstream users see is a single repo.


I understand and agree with the idea of trying to converge rapidly. I'm just 
looking at cases where this may be hard (or where there may be holdouts for 
whatever reason)


David Lang


Re: SHA1 collisions found

2017-02-24 Thread Jeff King
On Fri, Feb 24, 2017 at 05:39:43PM -0800, David Lang wrote:

> On Fri, 24 Feb 2017, Jeff King wrote:
> 
> > > what if they are forks of each other? (LEDE and OpenWRT, or just
> > > linux-kernel and linux-kernel-stable)
> > 
> > Once one flips, the other one needs to flip to, or can't interact with
> > them. I know that's harsh, and is likely to create headaches. But in the
> > long run, I think once everything has converged the resulting system is
> > less insane.
> > 
> > For that reason I _wouldn't_ recommend projects like the kernel flip the
> > flag immediately. Ideally we write the code and the new versions
> > permeate the community. Then somebody (per-project) decides that it's
> > time for the community to start switching.
> 
> can you 'un-flip' the flag? or if you have someone who is a developer flip
> their repo (because they heard that sha1 is unsafe, and they want to be
> safe), they can't contribute to the kernel. We don't want to have them loose
> all their work, so how can they convert their local repo back to somthing
> that's compatible?

I don't think it would be too hard to write an un-flipper (it's
basically just rewriting the newhash bit of history using sha1, and
converting your refs back to point at the sha1s).

> how would submodules work if one module flips and another (or the parent)
> doesn't?

That's a good question. It's possible that another exception should be
carved out for referring to a gitlink via sha1 (we _could_ say "no,
point to a newhash version of the submodule", but I think that creates a
lot of hardship for not much gain).

> OpenWRT/LEDE have their core repo, and they pull from many other (unrelated)
> projects into that repo (and then have 'feeds', which is
> sort-of-like-submodules to pull in other software that's maintained
> completely independently)

I think with submodules this should probably still work.  If they are
pulling in with a subtree-ish strategy, then they'd convert the incoming
trees to the newhash format as part of that.

> Microsoft has made lots of money with people being forced to upgrade Word
> because one person got a new version and everyone else needed to upgrade to
> be compatible. There's a LOT of pain during that process. Is that really the
> best way to go?

I think there's going to be a lot of pain regardless. Any attempt to
mitigate that pain and work seamlessly across old and new versions of
git is going cause _ongoing_ pain as people quietly rewrite the same
content back and forth with different hashes. The viral-convergence
strategy is painful once (when you're forced to upgrade), but after that
just works.

If you want to work on a dual-hash strategy, be my guest. I can't
promise I'll be able to find horrific corner cases in it, but I
certainly can't even try to do so until there is a concrete proposal. :)

-Peff


Re: SHA1 collisions found

2017-02-24 Thread David Lang

On Fri, 24 Feb 2017, Jeff King wrote:


what if they are forks of each other? (LEDE and OpenWRT, or just
linux-kernel and linux-kernel-stable)


Once one flips, the other one needs to flip to, or can't interact with
them. I know that's harsh, and is likely to create headaches. But in the
long run, I think once everything has converged the resulting system is
less insane.

For that reason I _wouldn't_ recommend projects like the kernel flip the
flag immediately. Ideally we write the code and the new versions
permeate the community. Then somebody (per-project) decides that it's
time for the community to start switching.


can you 'un-flip' the flag? or if you have someone who is a developer flip their 
repo (because they heard that sha1 is unsafe, and they want to be safe), they 
can't contribute to the kernel. We don't want to have them loose all their work, 
so how can they convert their local repo back to somthing that's compatible?


how would submodules work if one module flips and another (or the parent) 
doesn't?


OpenWRT/LEDE have their core repo, and they pull from many other (unrelated) 
projects into that repo (and then have 'feeds', which is sort-of-like-submodules 
to pull in other software that's maintained completely independently)


Microsoft has made lots of money with people being forced to upgrade Word 
because one person got a new version and everyone else needed to upgrade to be 
compatible. There's a LOT of pain during that process. Is that really the best 
way to go?


David Lang



[PATCH] submodule init: warn about falling back to a local path

2017-02-24 Thread Stefan Beller
When a submodule is initialized, the config variable 'submodule..url'
is set depending on the value of the same variable in the .gitmodules
file. When the URL indicates to be relative, then the url is computed
relative to its default remote. The default remote cannot be determined
accurately in all cases, such that it falls back to 'origin'.

The 'origin' remote may not exist, though. In that case we give up looking
for a suitable remote and we'll just assume it to be a local relative path.

This can be confusing to users as there is a lot of guessing involved,
which is not obvious to the user.

So in the corner case of assuming a local autoritative truth, warn the
user to lessen the confusion.

This behavior was introduced in 4d6893200 (submodule add: allow relative
repository path even when no url is set, 2011-06-06), which shared the
code with submodule-init and then ported to C in 3604242f080a (submodule:
port init from shell to C, 2016-04-15).

In case of submodule-add, this behavior makes sense in some use cases[1],
however for submodule-init there does not seem to be an immediate obvious
use case to fall back to a local submodule. However there might be, so
warn instead of die here.

While adding the warning, also clarify the behavior of relative URLs in
the documentation.

[1] e.g. 
http://stackoverflow.com/questions/8721984/git-ignore-files-for-public-repository-but-not-for-private
"store a secret locally in a submodule, with no intention to publish it"

Reported-by: Shawn Pearce 
Signed-off-by: Stefan Beller 
---

> Perhaps s/ all submodules/,&/?

done

> I read this as "copying..., resolving relative to the default remote
> (if exists)."

reworded with shorter sentences:
  Initialize the submodules recorded in the index (which were
  added and committed elsewhere) by setting `submodule.$name.url`
  in .git/config. It uses the same setting from .gitmodules as
  a template. If the URL is relative, it will be resolved using
  the default remote. If there is no default remote, the current
  repository will be assumed to be upstream.
...
   next paragraph
  

 Documentation/git-submodule.txt | 38 --
 builtin/submodule--helper.c |  8 +++-
 2 files changed, 27 insertions(+), 19 deletions(-)

diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index 8acc72ebb8..e05d0cddef 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -73,13 +73,17 @@ configuration entries unless `--name` is used to specify a 
logical name.
 +
  is the URL of the new submodule's origin repository.
 This may be either an absolute URL, or (if it begins with ./
-or ../), the location relative to the superproject's origin
+or ../), the location relative to the superproject's default remote
 repository (Please note that to specify a repository 'foo.git'
 which is located right next to a superproject 'bar.git', you'll
 have to use '../foo.git' instead of './foo.git' - as one might expect
 when following the rules for relative URLs - because the evaluation
 of relative URLs in Git is identical to that of relative directories).
-If the superproject doesn't have an origin configured
++
+The default remote is the remote of the remote tracking branch
+of the current branch. If no such remote tracking branch exists or
+the HEAD is detached, "origin" is assumed to be the default remote.
+If the superproject doesn't have a default remote configured
 the superproject is its own authoritative upstream and the current
 working directory is used instead.
 +
@@ -118,18 +122,24 @@ too (and can also report changes to a submodule's work 
tree).
 
 init [--] [...]::
Initialize the submodules recorded in the index (which were
-   added and committed elsewhere) by copying submodule
-   names and urls from .gitmodules to .git/config.
-   Optional  arguments limit which submodules will be initialized.
-   It will also copy the value of `submodule.$name.update` into
-   .git/config.
-   The key used in .git/config is `submodule.$name.url`.
-   This command does not alter existing information in .git/config.
-   You can then customize the submodule clone URLs in .git/config
-   for your local setup and proceed to `git submodule update`;
-   you can also just use `git submodule update --init` without
-   the explicit 'init' step if you do not intend to customize
-   any submodule locations.
+   added and committed elsewhere) by setting `submodule.$name.url`
+   in .git/config. It uses the same setting from .gitmodules as
+   a template. If the URL is relative, it will be resolved using
+   the default remote. If there is no default remote, the current
+   repository will be assumed to be upstream.
++
+Optional  arguments limit which submodules will be initialized.
+If no path is specified, all submodules are initialized.
++
+When present, it 

Re: SHA1 collisions found

2017-02-24 Thread Jeff King
On Fri, Feb 24, 2017 at 05:00:55PM -0800, David Lang wrote:

> On Fri, 24 Feb 2017, Jeff King wrote:
> 
> > 
> > So I'd much rather see strong rules like:
> > 
> >  1. Once a repo has flag-day switched over to the new hash format[1],
> > new references are _always_ done with the new hash. Even ones that
> > point to pre-flag-day objects!
> 
> how do you define when a repo has "switched over" to the new format in a
> distributed environment?

You don't. It's a decision for each local repo, but the rules push
everybody towards upgrading (because you forbid them pulling from or
pushing to people who have upgraded).

So in practice, some centralized distribution point switches, and then
it floods out from there.

> so you have one person working on a project that switches their version of
> git to the new one that uses the new format.

That shouldn't happen when they switch. It should happen when they
decide to move their local clone to the new format. So let's assume they
upgrade _and_ decide to switch.

> But other people they interact with still use older versions of git

Those people get forced to upgrade if they want to continue interacting.

> what happens when you have someone working on two different projects where
> one has switched and the other hasn't?

See above. You only flip the flag on for one of the projects.

> what if they are forks of each other? (LEDE and OpenWRT, or just
> linux-kernel and linux-kernel-stable)

Once one flips, the other one needs to flip to, or can't interact with
them. I know that's harsh, and is likely to create headaches. But in the
long run, I think once everything has converged the resulting system is
less insane.

For that reason I _wouldn't_ recommend projects like the kernel flip the
flag immediately. Ideally we write the code and the new versions
permeate the community. Then somebody (per-project) decides that it's
time for the community to start switching.

> > The flag-day switch would probably be a repo config flag based on
> > repositoryformatversion (so old versions would just punt if they
> > see it). Let's call this flag "newhash" for lack of a better term.
> 
> so how do you interact with someone who only expects the old commit instead
> of the commit-v2?

You ask them to upgrade.

-Peff


[PATCH 1/3] revision: unify {tree,blob}_objects in rev_info

2017-02-24 Thread Jonathan Tan
Whenever tree_objects is set to 1 in revision.h's struct rev_info,
blob_objects is likewise set, and vice versa. Combine those two fields
into one.

Some of the existing code does not handle tree_objects being different
from blob_objects properly. For example, "handle_commit" in revision.c
recurses from an UNINTERESTING tree into its subtree if tree_objects ==
1, completely ignoring blob_objects; it probably should still recurse if
tree_objects == 0 and blob_objects == 1 (to mark the blobs), and should
behave differently depending on blob_objects (controlling the
instantiation and marking of blob objects). This commit resolves the
issue by forbidding tree_objects from being different to blob_objects.

It could be argued that in the future, Git might need to distinguish
tree_objects from blob_objects - in particular, a user might want
rev-list to print the trees but not the blobs. However, this results in
a minor performance savings at best in that objects no longer need to be
instantiated (causing memory allocations and hashtable insertions) - no
disk reads are being done for objects whether blob_objects is set or
not.

Signed-off-by: Jonathan Tan 
---
 bisect.c|  2 +-
 builtin/rev-list.c  |  6 +++---
 list-objects.c  |  4 ++--
 pack-bitmap-write.c |  3 +--
 pack-bitmap.c   |  3 +--
 reachable.c |  3 +--
 revision.c  | 16 ++--
 revision.h  |  3 +--
 8 files changed, 16 insertions(+), 24 deletions(-)

diff --git a/bisect.c b/bisect.c
index 8e63c40d2..265b32905 100644
--- a/bisect.c
+++ b/bisect.c
@@ -634,7 +634,7 @@ static void bisect_common(struct rev_info *revs)
 {
if (prepare_revision_walk(revs))
die("revision walk setup failed");
-   if (revs->tree_objects)
+   if (revs->tree_and_blob_objects)
mark_edges_uninteresting(revs, NULL);
 }
 
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 0aa93d589..6c2651b31 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -345,7 +345,7 @@ int cmd_rev_list(int argc, const char **argv, const char 
*prefix)
revs.commit_format = CMIT_FMT_RAW;
 
if ((!revs.commits &&
-(!(revs.tag_objects || revs.tree_objects || revs.blob_objects) &&
+(!(revs.tag_objects || revs.tree_and_blob_objects) &&
  !revs.pending.nr)) ||
revs.diff)
usage(rev_list_usage);
@@ -374,7 +374,7 @@ int cmd_rev_list(int argc, const char **argv, const char 
*prefix)
return 0;
}
} else if (revs.max_count < 0 &&
-  revs.tag_objects && revs.tree_objects && 
revs.blob_objects) {
+  revs.tag_objects && revs.tree_and_blob_objects) {
if (!prepare_bitmap_walk()) {
traverse_bitmap_commit_list(_object_fast);
return 0;
@@ -384,7 +384,7 @@ int cmd_rev_list(int argc, const char **argv, const char 
*prefix)
 
if (prepare_revision_walk())
die("revision walk setup failed");
-   if (revs.tree_objects)
+   if (revs.tree_and_blob_objects)
mark_edges_uninteresting(, show_edge);
 
if (bisect_list) {
diff --git a/list-objects.c b/list-objects.c
index f3ca6aafb..796957105 100644
--- a/list-objects.c
+++ b/list-objects.c
@@ -18,7 +18,7 @@ static void process_blob(struct rev_info *revs,
struct object *obj = >object;
size_t pathlen;
 
-   if (!revs->blob_objects)
+   if (!revs->tree_and_blob_objects)
return;
if (!obj)
die("bad blob object");
@@ -78,7 +78,7 @@ static void process_tree(struct rev_info *revs,
all_entries_interesting: entry_not_interesting;
int baselen = base->len;
 
-   if (!revs->tree_objects)
+   if (!revs->tree_and_blob_objects)
return;
if (!obj)
die("bad tree object");
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 970559601..5080e276b 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -258,8 +258,7 @@ void bitmap_writer_build(struct packing_data *to_pack)
 
init_revisions(, NULL);
revs.tag_objects = 1;
-   revs.tree_objects = 1;
-   revs.blob_objects = 1;
+   revs.tree_and_blob_objects = 1;
revs.no_walk = 0;
 
revs.include_check = should_include;
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 39bcc1684..445a24e0d 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -951,8 +951,7 @@ void test_bitmap_walk(struct rev_info *revs)
die("Commit %s doesn't have an indexed bitmap", 
oid_to_hex(>oid));
 
revs->tag_objects = 1;
-   revs->tree_objects = 1;
-   revs->blob_objects = 1;
+   revs->tree_and_blob_objects = 1;
 
result_popcnt = bitmap_popcount(result);
 
diff --git a/reachable.c 

[PATCH 0/3] Test fetch-pack's ability to fetch arbitrary blobs

2017-02-24 Thread Jonathan Tan
As stated in a previous e-mail [1], I was trying to think a way to allow
Git to fetch arbitrary blobs from another Git server, and it turned out
that fetch-pack already can. However, there were some bugs with blob
reachability. This patch set fixes those bugs, and verifies (with tests)
that fetch-pack can fetch reachable blobs and cannot fetch unreachable
blobs.

These patches are (I think) worthwhile on their own, but may be of
special interest to people who need Git to tolerate missing objects in
the local repo (for example, the e-mail discussion "[RFC] Add support
for downloading blobs on demand" [2]) because a way for Git to download
missing objects natively is (I think) a prerequisite to that.

[1] <20170223230358.30050-1-jonathanta...@google.com>
[2] <20170113155253.1644-1-benpe...@microsoft.com>

Jonathan Tan (3):
  revision: unify {tree,blob}_objects in rev_info
  revision: exclude trees/blobs given commit
  upload-pack: compute blob reachability correctly

 bisect.c |  2 +-
 builtin/rev-list.c   |  6 ++--
 list-objects.c   |  4 +--
 pack-bitmap-write.c  |  3 +-
 pack-bitmap.c|  3 +-
 reachable.c  |  3 +-
 revision.c   | 18 +-
 revision.h   |  3 +-
 t/t5500-fetch-pack.sh| 30 +
 t/t6000-rev-list-misc.sh | 88 
 upload-pack.c| 15 +
 11 files changed, 151 insertions(+), 24 deletions(-)

-- 
2.11.0.483.g087da7b7c-goog



[PATCH 3/3] upload-pack: compute blob reachability correctly

2017-02-24 Thread Jonathan Tan
If allowreachablesha1inwant is set, upload-pack will provide a blob to a
user, provided its hash, regardless of whether the blob is reachable or
not. Teach upload-pack to compute reachability correctly by passing the
"--objects" argument when it invokes "rev-list" if necessary.

This commit only affects the case where blob/tree hashes are provided to
upload-pack; the more typical case of only commit/tag hashes being
provided is not affected. In the case where blob/tree hashes are
provided, the reachability check is now slower (since trees need to be
read) but correct. (The user may still set allowanysha1inwant instead of
allowreachablesha1inwant to opt-out of the reachability check.)

Signed-off-by: Jonathan Tan 
---
 t/t5500-fetch-pack.sh | 30 ++
 upload-pack.c | 15 +++
 2 files changed, 45 insertions(+)

diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
index 505e1b4a7..a4ae888ff 100755
--- a/t/t5500-fetch-pack.sh
+++ b/t/t5500-fetch-pack.sh
@@ -547,6 +547,36 @@ test_expect_success 'fetch-pack can fetch a raw sha1' '
git fetch-pack hidden $(git -C hidden rev-parse refs/hidden/one)
 '
 
+test_expect_success 'setup for tests that fetch blobs by hash' '
+   git init blobserver &&
+   test_commit -C blobserver 1 &&
+   test_commit -C blobserver 2 &&
+   test_commit -C blobserver 3 &&
+   blob1=$(echo 1 | git hash-object --stdin) &&
+   blob2=$(echo 2 | git hash-object --stdin) &&
+   blob3=$(echo 3 | git hash-object --stdin) &&
+
+   unreachable=$(echo 4 | git -C blobserver hash-object -w --stdin) &&
+   git -C blobserver cat-file -e "$unreachable"
+'
+
+test_expect_success 'fetch-pack can fetch reachable blobs by hash' '
+   test_config -C blobserver uploadpack.allowreachablesha1inwant 1 &&
+
+   git init reachabletest &&
+   git -C reachabletest fetch-pack ../blobserver "$blob1" "$blob2" &&
+   git -C reachabletest cat-file -e "$blob1" &&
+   git -C reachabletest cat-file -e "$blob2" &&
+   test_must_fail git -C reachabletest cat-file -e "$blob3"
+'
+
+test_expect_success 'fetch-pack cannot fetch unreachable blobs' '
+   test_config -C blobserver uploadpack.allowreachablesha1inwant 1 &&
+
+   git init unreachabletest &&
+   test_must_fail git -C unreachabletest fetch-pack ../blobserver "$blob1" 
"$unreachable"
+'
+
 check_prot_path () {
cat >expected <<-EOF &&
Diag: url=$1
diff --git a/upload-pack.c b/upload-pack.c
index 7597ba340..f05cc2b5e 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -471,6 +471,9 @@ static int do_reachable_revlist(struct child_process *cmd,
static const char *argv[] = {
"rev-list", "--stdin", NULL,
};
+   static const char *argv_with_objects[] = {
+   "rev-list", "--objects", "--stdin", NULL,
+   };
struct object *o;
char namebuf[42]; /* ^ + SHA-1 + LF */
int i;
@@ -488,6 +491,18 @@ static int do_reachable_revlist(struct child_process *cmd,
 */
sigchain_push(SIGPIPE, SIG_IGN);
 
+   /*
+* If we are testing reachability of a tree or blob, rev-list needs to
+* operate more granularly.
+*/
+   for (i = 0; i < src->nr; i++) {
+   o = src->objects[i].item;
+   if (o->type == OBJ_TREE || o->type == OBJ_BLOB) {
+   cmd->argv = argv_with_objects;
+   break;
+   }
+   }
+
if (start_command(cmd))
goto error;
 
-- 
2.11.0.483.g087da7b7c-goog



[PATCH 2/3] revision: exclude trees/blobs given commit

2017-02-24 Thread Jonathan Tan
When the --objects argument is given to rev-list, an argument of the
form "^$tree" can be given to exclude all blobs and trees reachable from
that tree, but an argument of the form "^$commit" only excludes that
commit, not any blob or tree reachable from it. Make "^$commit" behave
consistent to "^$tree".

Also, formalize this behavior in unit tests. (Some of the added tests
would already pass even before this commit, but are included
nevertheless for completeness.)

Signed-off-by: Jonathan Tan 
---
 revision.c   |  2 ++
 t/t6000-rev-list-misc.sh | 88 
 2 files changed, 90 insertions(+)

diff --git a/revision.c b/revision.c
index 5e49d9e0e..e6a62da98 100644
--- a/revision.c
+++ b/revision.c
@@ -254,6 +254,8 @@ static struct commit *handle_commit(struct rev_info *revs,
die("unable to parse commit %s", name);
if (flags & UNINTERESTING) {
mark_parents_uninteresting(commit);
+   if (revs->tree_and_blob_objects)
+   mark_tree_uninteresting(commit->tree);
revs->limited = 1;
}
if (revs->show_source && !commit->util)
diff --git a/t/t6000-rev-list-misc.sh b/t/t6000-rev-list-misc.sh
index 969e4e9e5..c04a9582b 100755
--- a/t/t6000-rev-list-misc.sh
+++ b/t/t6000-rev-list-misc.sh
@@ -114,4 +114,92 @@ test_expect_success '--header shows a NUL after each 
commit' '
test_cmp expect actual
 '
 
+test_expect_success 'setup tree-granularity and blob-granularity tests' '
+   echo 0 >file &&
+
+   mkdir subdir &&
+   echo 1 >subdir/file &&
+   git add file subdir/file &&
+   git commit -m one &&
+
+   echo 2 >subdir/file &&
+   git add subdir/file &&
+   git commit -m two &&
+
+   commit1=$(git rev-parse HEAD^1) &&
+   commit2=$(git rev-parse HEAD) &&
+   tree1=$(git rev-parse $commit1^{tree}) &&
+   tree2=$(git rev-parse $commit2^{tree}) &&
+   subtree1=$(git cat-file -p $tree1 | grep "subdir" | cut -c13-52) &&
+   subtree2=$(git cat-file -p $tree2 | grep "subdir" | cut -c13-52) &&
+   blob0=$(echo 0 | git hash-object --stdin) &&
+   blob1=$(echo 1 | git hash-object --stdin) &&
+   blob2=$(echo 2 | git hash-object --stdin)
+'
+
+test_expect_success 'include commit, exclude blob' '
+   git rev-list --objects $commit2 >out &&
+   grep "$blob1" out &&
+   grep "$blob2" out &&
+
+   git rev-list --objects $commit2 "^$blob2" >out &&
+   grep "$blob1" out &&
+   ! grep "$blob2" out
+'
+
+test_expect_success 'include commit, exclude tree (also excludes nested 
trees/blobs)' '
+   git rev-list --objects $commit2 "^$tree2" >out &&
+   grep "$tree1" out &&
+   grep "$subtree1" out &&
+   grep "$blob1" out &&
+   ! grep "$tree2" out &&
+   ! grep "$subtree2" out &&
+   ! grep "$blob2" out &&
+
+   git rev-list --objects $commit2 "^$subtree2" >out &&
+   grep "$tree1" out &&
+   grep "$subtree1" out &&
+   grep "$blob1" out &&
+   grep "$tree2" out &&
+   ! grep "$subtree2" out &&
+   ! grep "$blob2" out
+'
+
+test_expect_success 'include tree, exclude commit' '
+   git rev-list --objects "$tree1" "^$commit2" >out &&
+   ! grep "$blob0" out &&  # common to both
+   grep "$blob1" out &&# only in tree
+   ! grep "$blob2" out # only in commit
+'
+
+test_expect_success 'include tree, exclude tree' '
+   git rev-list --objects "$tree1" "^$tree2" >out &&
+   ! grep "$blob0" out &&  # common to both
+   grep "$blob1" out &&# only in tree1
+   ! grep "$blob2" out # only in tree2
+'
+
+test_expect_success 'include tree, exclude blob' '
+   git rev-list --objects "$tree1" "^$blob2" >out &&
+   grep "$blob0" out &&
+   grep "$blob1" out &&
+   ! grep "$blob2" out
+'
+
+test_expect_success 'include blob, exclude commit' '
+   git rev-list --objects "$blob2" "^$commit1" >out &&
+   grep "$blob2" out &&
+
+   git rev-list --objects "$blob2" "^$commit2" >out &&
+   ! grep "$blob2" out
+'
+
+test_expect_success 'include blob, exclude tree' '
+   git rev-list --objects "$blob2" "^$tree1" >out &&
+   grep "$blob2" out &&
+
+   git rev-list --objects "$blob2" "^$tree2" >out &&
+   ! grep "$blob2" out
+'
+
 test_done
-- 
2.11.0.483.g087da7b7c-goog



Re: SHA1 collisions found

2017-02-24 Thread Jeff King
On Fri, Feb 24, 2017 at 04:39:45PM -0800, Linus Torvalds wrote:

> For example, what I would suggest the rules be is something like this:
> 
>  - introduce new tag2/commit2/tree2/blob2 object type tags that imply
> that they were hashed using the new hash
> 
>  - an old type obviously can never contain a pointer to a new type (ie
> you can't have a "tree" object that contains a tree2 object or a blob2
> object.
> 
>  - but also make the rule that a *new* type can never contain a
> pointer to an old type, with the *very* specific exception that a
> commit2 can have a parent that is of type "commit".

Yeah, this is exactly what I had in mind. That way everybody in
"newhash" mode has no decisions to make. They follow the same rules and
it's as if sha1 never existed, except when you follow links in
historical objects.

> [in reply...]
> Actually, I take that back. I think it might be easier to keep
> "object->type" as-is, and it would only show the current OBJ_xyz
> fields. Then writing the SHA ends up deciding whether a OBJ_COMMIT
> gets written as "commit" or "commit2".

Yeah, I think there are some data structures with limited bits for the
"type" fields (e.g., the pack format). So sticking with OBJ_COMMIT might
be nice. For commits and tags, it would be nice to have an "I'm v2"
header at the start so there's no confusion about how they are meant to
be interpreted.

Trees are more difficult, as they don't have any such field. But a valid
tree does need to start with a mode, so sticking some non-numeric flag
at the front of the object would work (it breaks backwards
compatibility, but that's kind of the point).

I dunno. Maybe we do not need those markers at all, and could get by
purely on object-length, or annotating the headers in some way (like
"parent sha256:1234abcd").

It might just be nice if we could very easily identify objects as one
type or the other without having to parse them in detail.

> So you will end up with duplicate objects, and that's not good (think
> of what it does to all our full-tree "diff" optimizations, for example
> - you no longer get the "these sub-trees are identical" across a
> format change), but realistically you'll have a very limited time of
> that kind of duplication.

Yeah, cross-flag-day diffs will be more expensive. I think that's
something we have to live with. I was thinking originally that the
sha1->newhash mapping might solve that, but it only works at the blob
level. I.e., you can compare a sha1 and a newhash like:

  if (!hashcmp(sha1_to_newhash(a), b))

without having to look at the contents. But it doesn't work recursively,
because the tree-pointing-to-newhash will have different content.

-Peff


Re: SHA1 collisions found

2017-02-24 Thread Stefan Beller
On Fri, Feb 24, 2017 at 5:00 PM, David Lang  wrote:
> On Fri, 24 Feb 2017, Jeff King wrote:
>
>>
>> So I'd much rather see strong rules like:
>>
>>  1. Once a repo has flag-day switched over to the new hash format[1],
>> new references are _always_ done with the new hash. Even ones that
>> point to pre-flag-day objects!
>
>
> how do you define when a repo has "switched over" to the new format in a
> distributed environment?
>
> so you have one person working on a project that switches their version of
> git to the new one that uses the new format.
>
> But other people they interact with still use older versions of git
>
> what happens when you have someone working on two different projects where
> one has switched and the other hasn't?

you get infected by the "new version requirement"
as soon as you pull? (GPL is cancer, anyone? ;)

If you are using an old version of git that doesn't understand the new version,
you're screwed.


Re: SHA1 collisions found

2017-02-24 Thread David Lang

On Fri, 24 Feb 2017, Jeff King wrote:



So I'd much rather see strong rules like:

 1. Once a repo has flag-day switched over to the new hash format[1],
new references are _always_ done with the new hash. Even ones that
point to pre-flag-day objects!


how do you define when a repo has "switched over" to the new format in a 
distributed environment?


so you have one person working on a project that switches their version of git 
to the new one that uses the new format.


But other people they interact with still use older versions of git

what happens when you have someone working on two different projects where one 
has switched and the other hasn't?


what if they are forks of each other? (LEDE and OpenWRT, or just linux-kernel 
and linux-kernel-stable)




So you get a "commit-v2" object instead of a "commit", and it has a
distinct hash identity from its "commit" counterpart. You can point
to a classic "commit", but you do so by its new-hash.

The flag-day switch would probably be a repo config flag based on
repositoryformatversion (so old versions would just punt if they
see it). Let's call this flag "newhash" for lack of a better term.


so how do you interact with someone who only expects the old commit instead of 
the commit-v2?


David Lang


Re: SHA1 collisions found

2017-02-24 Thread Linus Torvalds
On Fri, Feb 24, 2017 at 4:39 PM, Linus Torvalds
 wrote:
>
>  - you'd see in the "object->type" whether it's a new or old-style hash.

Actually, I take that back. I think it might be easier to keep
"object->type" as-is, and it would only show the current OBJ_xyz
fields. Then writing the SHA ends up deciding whether a OBJ_COMMIT
gets written as "commit" or "commit2".

With the reachability rules, you'd never have any ambiguity about which to use.

Linus


Re: [PATCH] mingw: use OpenSSL's SHA-1 routines

2017-02-24 Thread Linus Torvalds
On Thu, Feb 9, 2017 at 9:02 PM, Jeff King  wrote:
>
> I think this is only half the story. A heavy-sha1 workload is faster,
> which is good. But one of the original reasons to prefer blk-sha1 (at
> least on Linux) is that resolving libcrypto.so symbols takes a
> non-trivial amount of time. I just timed it again, and it seems to be
> consistently 1ms slower to run "git rev-parse --git-dir" on my machine
> (from the top-level of a repo).

Yes. It's also a horrible plain to profile those things.

Avoiding openssl was a great thing, because it avoided a lot of crazy overhead.

I suspect that most of the openssl win comes from using the actual SHA
instructions on modern CPU's. Because last I looked, the hand-coded
assembly simply wasn't that much faster.

We could easily have some x86-specific library that just does "use SHA
instructions if you have them, use blk-sha1 otherwise".

Of course, if we end up using the collision checking SHA libraries
this is all moot anyway.

 Linus


Re: SHA1 collisions found

2017-02-24 Thread Linus Torvalds
On Fri, Feb 24, 2017 at 3:39 PM, Jeff King  wrote:
>
> One thing I worry about in a mixed-hash setting is how often the two
> will be mixed.

Honestly, I think that a primary goal for a new hash implementation
absolutely needs to be to minimize mixing.

Not for security issues, but because of combinatorics. You want to
have a model that basically reads old data, but that very aggressively
approaches "new data only" in order to avoid the situation where you
have basically the exact same tree state, just _represented_
differently.

For example, what I would suggest the rules be is something like this:

 - introduce new tag2/commit2/tree2/blob2 object type tags that imply
that they were hashed using the new hash

 - an old type obviously can never contain a pointer to a new type (ie
you can't have a "tree" object that contains a tree2 object or a blob2
object.

 - but also make the rule that a *new* type can never contain a
pointer to an old type, with the *very* specific exception that a
commit2 can have a parent that is of type "commit".

That way everything "converges" towards the new format: the only way
you can stay on the old format is if you only have old-format objects,
and once you have a new-format object all your objects are going to be
new format - except for the history.

Obviously, if somebody stays in old format, you might end up still
getting some object duplication when you continue to merge from him,
but that tree can never merge back without converting to new-format,
so it will be a temporary situation.

So you will end up with duplicate objects, and that's not good (think
of what it does to all our full-tree "diff" optimizations, for example
- you no longer get the "these sub-trees are identical" across a
format change), but realistically you'll have a very limited time of
that kind of duplication.

I'd furthermore suggest that from a UI standpoint, we'd

 - convert to 64-character hex numbers (32-byte hashes)

 - (as mentioned earlier) default to a 40-character abbreviation

 - make the old 40-character SHA1's just show up within the same
address space (so they'd also be encoded as 32-byte hashes, just with
the last 12 bytes zero).

 - you'd see in the "object->type" whether it's a new or old-style hash.

I suspect it shouldn't be too painful to do it that way.

Linus


Re: [PATCH 2/5] pathspec: add PATHSPEC_FROMROOT flag

2017-02-24 Thread Stefan Beller
On Fri, Feb 24, 2017 at 3:50 PM, Brandon Williams  wrote:
> Add the `PATHSPEC_FROMROOT` flag to allow callers to instruct
> 'parse_pathspec()' that all provided pathspecs are relative to the root
> of the repository.  This allows a caller to prevent a path that may be
> outside of the repository from erroring out during the pathspec struct
> construction.
>


> +/* For callers that know all paths are relative to the root of the 
> repository */
> +#define PATHSPEC_FROMROOT (1<<9)

What is the calling convention for these submodule pathspecs?
IIRC, we'd pass on the super-prefix and the literal pathspec,
e.g. when there is a submodule "sub" inside the superproject,
The invocation on the submodule would be

git FOO --super-prefix="sub"  -- sub/pathspec/inside/...

and then the submodule process would need to "subtract" the superprefix
from the pathspec argument to see its actual pathspec, e.g. in gerrit:

$ GIT_TRACE=1 git grep --recurse-submodules -i test -- \
plugins/cookbook-plugin/
...

trace: run_command: '--super-prefix=plugins/cookbook-plugin/' 'grep' \
  '--recurse-submodules' '-i' '-etest' '--threads=4' '--'
'plugins/cookbook-plugin/'
..
but also:
...
 trace: run_command: '--super-prefix=plugins/download-commands/' 'grep' \
  '--recurse-submodules' '-i' '-etest' '--threads=4' '--'
'plugins/cookbook-plugin/'
...

So if I change into a directory:

$ cd plugins
plugins$ git grep --recurse-submodules -i test -- cookbook-plugin/
plugins$ #empty?
plugins$ git grep --recurse-submodules -i test -- plugins/cookbook-plugin/
...
Usual output, so the pathspecs are absolute path to the superprojects
root? Let's try relative path:
plugins$ git grep --recurse-submodules -i test -- ../plugins/cookbook-plugin/
fatal: ../plugins/cookbook-plugin/: '../plugins/cookbook-plugin/' is
outside repository
...
Running with GIT_TRACE=1:

trace: run_command: '--super-prefix=plugins/cookbook-plugin/' 'grep' \
'--recurse-submodules' '-i' '-etest' '--threads=4' '--'
'../plugins/cookbook-plugin/'

that seems like a mismatch of pathspec and superproject prefix, the prefix ought
to be different then? Maybe also including ../ because that is the
relative path from
cwd to the superporojects root and that is where we anchor all paths?

Easy to test that out:

plugins$ GIT_TRACE=1 git --super-prefix=../ grep --recurse-submodules \
-i test -- ../plugins/cookbook-plugin/
fatal: can't use --super-prefix from a subdirectory

ok, not as easy. :/

So another test with relative path:
(in git.git)

cd t/diff-lib
t/diff-lib$ git grep freedom
COPYING:freedom to share and change it.  By contrast, the GNU General Public
...

So the path displayed is relative to the cwd (and the search results as well)
In the submodule case we would expect to have the super prefix
to be computed to be relative to the cwd?

Checking the tests, this is handled correctly with this patch series. :)

But nevertheless, I think I know why I dislike this approach now:
The super prefix is handled "too dumb" IMHO, see the case
plugins$ git grep test  -- cookbook-plugin/
above, that doesn't correctly figure out the correct output.
Although this might be a separate bug, but it sounds like it
is the same underlying issue.

--
for the naming: How about PATHSPEC_FROMOUTSIDE
when going with the series as is here?
(the superprefix is not resolved, so the pathspecs given are
literally pathspecs that are outside this repo and we can ignore
them?

Thanks,
Stefan


Re: SHA1 collisions found

2017-02-24 Thread ankostis
On 24 February 2017 at 21:32, Junio C Hamano  wrote:
> ankostis  writes:
>
>> Let's assume that git is retroffited to always support the "default"
>> SHA-3, but support additionally more hash-funcs.
>> If in the future SHA-3 also gets defeated, it would be highly unlikely
>> that the same math would also break e.g. Blake.
>> So certain high-profile repos might choose for extra security 2 or more 
>> hashes.
>
> I think you are conflating two unrelated things.

I believe the two distinct things you refer to below are these:

  a. storing objects in filesystem and accessing them
 by name (e.g. from cmdline), and

  b. cross-referencing inside the objects (trees, tags, notes),

correct?

If not, then please ignore my answers, below.


>  * How are these "2 or more hashes" actually used?  Are you going to
>add three "parent " line to a commit with just one parent, each
>line storing the different hashes?

Yes, in all places where references are involved (tags, notes).
Based on what what the git-hackers have written so far, this might be doable.

To ensure integrity in the case of crypto-failures, all objects must
cross-reference each other with multiple hashes.
Of course this extra security would stop as soon as you reach "old"
history (unless you re-write it).


>How will such a commit object
>be named---does it have three names and do you plan to have three
>copies of .git/refs/heads/master somehow, each of which have
>SHA-1, SHA-3 and Blake, and let any one hash to identify the
>object?

Yes, based on Jason Cooper's idea, above, objects would be stored
under all names in the filesystem using hard links (although this
might not work nice on Windows).


>I suspect you are not going to do so; instead, you would use a
>very long string that is a concatenation of these three hashes as
>if it is an output from a single hash function that produces a
>long result.
>
>So I think the most natural way to do the "2 or more for extra
>security" is to allow us to use a very long hash.  It does not
>help to allow an object to be referred to with any of these 2 or
>more hashes at the same time.

If hard-linking all names is doable, then most restrictions above are
gone, correct?


>  * If employing 2 or more hashes by combining into one may enhance
>the security, that is wonderful.  But we want to discourage
>people from inventing their own combinations left and right and
>end up fragmenting the world.  If a project that begins with
>SHA-1 only naming is forked to two (or more) and each fork uses
>different hashes, merging them back will become harder than
>necessary unless you support all these hashes forks used.

Agree on discouraging people's inventions.

That is why I believe that some HASH (e.g. SHA-3) must be the blessed one.
All git >= 3.x.x must support at least this one (for naming and
cross-referencing between objects).


> Having said all that, the way to figure out the hash used in the way
> we spell the object name may not be the best place to discourage
> people from using random hashes of their choice.  But I think we
> want to avoid doing something that would actively encourage
> fragmentation.

I guess the "blessed SHA-3 will discourage people using the other
names., untill the next crypto-crack.


Re: SHA1 collisions found

2017-02-24 Thread Ian Jackson
Ian Jackson writes ("Re: SHA1 collisions found"):
> The idea of using the length is a neat trick, but it cannot support
> the dcurrent object name abbreviation approach unworkable.

Sorry, it's late here and my grammar seems to have disintegrated !

Ian.


[PATCH 4/5] ls-files: illustrate bug when recursing with relative pathspec

2017-02-24 Thread Brandon Williams
When using the --recurse-submodules flag with a relative pathspec which
includes "..", an error is produced inside the child process spawned for a
submodule.  When creating the pathspec struct in the child, the ".." is
interpreted to mean "go up a directory" which causes an error stating that the
path ".." is outside of the repository.

While it is true that ".." is outside the scope of the submodule, it is
confusing to a user who originally invoked the command where ".." was indeed
still inside the scope of the superproject.  Since the child process luanched
for the submodule has some context that it is operating underneath a
superproject, this error could be avoided.

Signed-off-by: Brandon Williams 
---
 t/t3007-ls-files-recurse-submodules.sh | 50 ++
 1 file changed, 50 insertions(+)

diff --git a/t/t3007-ls-files-recurse-submodules.sh 
b/t/t3007-ls-files-recurse-submodules.sh
index a5426171d..d8ab10866 100755
--- a/t/t3007-ls-files-recurse-submodules.sh
+++ b/t/t3007-ls-files-recurse-submodules.sh
@@ -188,6 +188,56 @@ test_expect_success '--recurse-submodules and pathspecs' '
test_cmp expect actual
 '
 
+test_expect_failure '--recurse-submodules and relative paths' '
+   # From top works
+   cat >expect <<-\EOF &&
+   .gitmodules
+   a
+   b/b
+   h.txt
+   sib/file
+   sub/file
+   submodule/.gitmodules
+   submodule/c
+   submodule/f.TXT
+   submodule/g.txt
+   submodule/subsub/d
+   submodule/subsub/e.txt
+   EOF
+   git ls-files --recurse-submodules >actual &&
+   test_cmp expect actual &&
+
+   # Relative path to top errors out
+   cat >expect <<-\EOF &&
+   ../.gitmodules
+   ../a
+   b
+   ../h.txt
+   ../sib/file
+   ../sub/file
+   ../submodule/.gitmodules
+   ../submodule/c
+   ../submodule/f.TXT
+   ../submodule/g.txt
+   ../submodule/subsub/d
+   ../submodule/subsub/e.txt
+   EOF
+   git -C b ls-files --recurse-submodules -- .. >actual &&
+   test_cmp expect actual &&
+
+   # Relative path to submodule errors out
+   cat >expect <<-\EOF &&
+   ../submodule/.gitmodules
+   ../submodule/c
+   ../submodule/f.TXT
+   ../submodule/g.txt
+   ../submodule/subsub/d
+   ../submodule/subsub/e.txt
+   EOF
+   git -C b ls-files --recurse-submodules -- ../submodule >actual &&
+   test_cmp expect actual
+'
+
 test_expect_success '--recurse-submodules does not support --error-unmatch' '
test_must_fail git ls-files --recurse-submodules --error-unmatch 
2>actual &&
test_i18ngrep "does not support --error-unmatch" actual
-- 
2.11.0.483.g087da7b7c-goog



[PATCH 3/5] grep: fix bug when recuring with relative pathspec

2017-02-24 Thread Brandon Williams
Fix a bug which causes a child process for a submodule to error out when
a relative pathspec with a ".." is provided in the superproject.

While at it, correctly construct the super-prefix to be used in a
submodule when not at the root of the repository.

Signed-off-by: Brandon Williams 
---
 builtin/grep.c | 8 ++--
 t/t7814-grep-recurse-submodules.sh | 2 +-
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 2c727ef49..65f3413d1 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -538,6 +538,7 @@ static int grep_submodule_launch(struct grep_opt *opt,
int status, i;
const char *end_of_base;
const char *name;
+   struct strbuf buf = STRBUF_INIT;
struct work_item *w = opt->output_priv;
 
end_of_base = strchr(gs->name, ':');
@@ -550,9 +551,11 @@ static int grep_submodule_launch(struct grep_opt *opt,
argv_array_push(_array, GIT_DIR_ENVIRONMENT);
 
/* Add super prefix */
+   quote_path_relative(name, opt->prefix, );
argv_array_pushf(, "--super-prefix=%s%s/",
 super_prefix ? super_prefix : "",
-name);
+buf.buf);
+   strbuf_release();
argv_array_push(, "grep");
 
/*
@@ -1199,7 +1202,8 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
 
parse_pathspec(, 0,
   PATHSPEC_PREFER_CWD |
-  (opt.max_depth != -1 ? PATHSPEC_MAXDEPTH_VALID : 0),
+  (opt.max_depth != -1 ? PATHSPEC_MAXDEPTH_VALID : 0) |
+  (super_prefix ? PATHSPEC_FROMROOT : 0),
   prefix, argv + i);
pathspec.max_depth = opt.max_depth;
pathspec.recursive = 1;
diff --git a/t/t7814-grep-recurse-submodules.sh 
b/t/t7814-grep-recurse-submodules.sh
index 418ba68fe..e0932b2b7 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -227,7 +227,7 @@ test_expect_success 'grep history with moved submoules' '
test_cmp expect actual
 '
 
-test_expect_failure 'grep using relative path' '
+test_expect_success 'grep using relative path' '
test_when_finished "rm -rf parent sub" &&
git init sub &&
echo "foobar" >sub/file &&
-- 
2.11.0.483.g087da7b7c-goog



[PATCH 5/5] ls-files: fix bug when recuring with relative pathspec

2017-02-24 Thread Brandon Williams
Fix a bug which causes a child process for a submodule to error out when a
relative pathspec with a ".." is provided in the superproject.

While at it, correctly construct the super-prefix to be used in a submodule
when not at the root of the repository.

Signed-off-by: Brandon Williams 
---
 builtin/ls-files.c | 8 ++--
 t/t3007-ls-files-recurse-submodules.sh | 2 +-
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 159229081..89533ab8e 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -194,12 +194,15 @@ static void compile_submodule_options(const struct 
dir_struct *dir, int show_tag
 static void show_gitlink(const struct cache_entry *ce)
 {
struct child_process cp = CHILD_PROCESS_INIT;
+   struct strbuf name = STRBUF_INIT;
int status;
int i;
 
+   quote_path_relative(ce->name, prefix, );
argv_array_pushf(, "--super-prefix=%s%s/",
 super_prefix ? super_prefix : "",
-ce->name);
+name.buf);
+   strbuf_release();
argv_array_push(, "ls-files");
argv_array_push(, "--recurse-submodules");
 
@@ -615,7 +618,8 @@ int cmd_ls_files(int argc, const char **argv, const char 
*cmd_prefix)
 
parse_pathspec(, 0,
   PATHSPEC_PREFER_CWD |
-  PATHSPEC_STRIP_SUBMODULE_SLASH_CHEAP,
+  PATHSPEC_STRIP_SUBMODULE_SLASH_CHEAP |
+  (super_prefix ? PATHSPEC_FROMROOT : 0),
   prefix, argv);
 
/*
diff --git a/t/t3007-ls-files-recurse-submodules.sh 
b/t/t3007-ls-files-recurse-submodules.sh
index d8ab10866..1522ed235 100755
--- a/t/t3007-ls-files-recurse-submodules.sh
+++ b/t/t3007-ls-files-recurse-submodules.sh
@@ -188,7 +188,7 @@ test_expect_success '--recurse-submodules and pathspecs' '
test_cmp expect actual
 '
 
-test_expect_failure '--recurse-submodules and relative paths' '
+test_expect_success '--recurse-submodules and relative paths' '
# From top works
cat >expect <<-\EOF &&
.gitmodules
-- 
2.11.0.483.g087da7b7c-goog



[PATCH 2/5] pathspec: add PATHSPEC_FROMROOT flag

2017-02-24 Thread Brandon Williams
Add the `PATHSPEC_FROMROOT` flag to allow callers to instruct
'parse_pathspec()' that all provided pathspecs are relative to the root
of the repository.  This allows a caller to prevent a path that may be
outside of the repository from erroring out during the pathspec struct
construction.

Signed-off-by: Brandon Williams 
---
 pathspec.c | 2 +-
 pathspec.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/pathspec.c b/pathspec.c
index 7ababb315..ff622ba18 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -353,7 +353,7 @@ static void init_pathspec_item(struct pathspec_item *item, 
unsigned flags,
if (pathspec_prefix >= 0) {
match = xstrdup(copyfrom);
prefixlen = pathspec_prefix;
-   } else if (magic & PATHSPEC_FROMTOP) {
+   } else if ((magic & PATHSPEC_FROMTOP) || (flags & PATHSPEC_FROMROOT)) {
match = xstrdup(copyfrom);
prefixlen = 0;
} else {
diff --git a/pathspec.h b/pathspec.h
index 49fd823dd..cad197436 100644
--- a/pathspec.h
+++ b/pathspec.h
@@ -66,6 +66,8 @@ struct pathspec {
  * allowed, then it will automatically set for every pathspec.
  */
 #define PATHSPEC_LITERAL_PATH (1<<8)
+/* For callers that know all paths are relative to the root of the repository 
*/
+#define PATHSPEC_FROMROOT (1<<9)
 
 extern void parse_pathspec(struct pathspec *pathspec,
   unsigned magic_mask,
-- 
2.11.0.483.g087da7b7c-goog



[PATCH 1/5] grep: illustrate bug when recursing with relative pathspec

2017-02-24 Thread Brandon Williams
When using the --recurse-submodules flag with a relative pathspec which
includes "..", an error is produced inside the child process spawned for
a submodule.  When creating the pathspec struct in the child, the ".."
is interpreted to mean "go up a directory" which causes an error stating
that the path ".." is outside of the repository.

While it is true that ".." is outside the scope of the submodule, it is
confusing to a user who originally invoked the command where ".." was
indeed still inside the scope of the superproject.  Since the child
process luanched for the submodule has some context that it is operating
underneath a superproject, this error could be avoided.

Signed-off-by: Brandon Williams 
---
 t/t7814-grep-recurse-submodules.sh | 42 ++
 1 file changed, 42 insertions(+)

diff --git a/t/t7814-grep-recurse-submodules.sh 
b/t/t7814-grep-recurse-submodules.sh
index 67247a01d..418ba68fe 100755
--- a/t/t7814-grep-recurse-submodules.sh
+++ b/t/t7814-grep-recurse-submodules.sh
@@ -227,6 +227,48 @@ test_expect_success 'grep history with moved submoules' '
test_cmp expect actual
 '
 
+test_expect_failure 'grep using relative path' '
+   test_when_finished "rm -rf parent sub" &&
+   git init sub &&
+   echo "foobar" >sub/file &&
+   git -C sub add file &&
+   git -C sub commit -m "add file" &&
+
+   git init parent &&
+   echo "foobar" >parent/file &&
+   git -C parent add file &&
+   mkdir parent/src &&
+   echo "foobar" >parent/src/file2 &&
+   git -C parent add src/file2 &&
+   git -C parent submodule add ../sub &&
+   git -C parent commit -m "add files and submodule" &&
+
+   # From top works
+   cat >expect <<-\EOF &&
+   file:foobar
+   src/file2:foobar
+   sub/file:foobar
+   EOF
+   git -C parent grep --recurse-submodules -e "foobar" >actual &&
+   test_cmp expect actual &&
+
+   # Relative path to top errors out
+   cat >expect <<-\EOF &&
+   ../file:foobar
+   file2:foobar
+   ../sub/file:foobar
+   EOF
+   git -C parent/src grep --recurse-submodules -e "foobar" -- .. >actual &&
+   test_cmp expect actual &&
+
+   # Relative path to submodule errors out
+   cat >expect <<-\EOF &&
+   ../sub/file:foobar
+   EOF
+   git -C parent/src grep --recurse-submodules -e "foobar" -- ../sub 
>actual &&
+   test_cmp expect actual
+'
+
 test_incompatible_with_recurse_submodules ()
 {
test_expect_success "--recurse-submodules and $1 are incompatible" "
-- 
2.11.0.483.g087da7b7c-goog



[PATCH 0/5] recursing submodules with relative pathspec (grep and ls-files)

2017-02-24 Thread Brandon Williams
It was discovered that when using the --recurse-submodules flag with `git grep`
and `git ls-files` and specifying a relative path when not at the root causes
the child processes spawned to error out with an error like:

fatal: ..: '..' is outside repository

While true that ".." is outside the scope of the submodule repository, it
probably doesn't make much sense to the user who gave that pathspec with
respect to the superproject.  Since the child processes that are spawned to
handle the submodules have some context that they are executing underneath a
superproject (via the 'super_prefix'), they should be able to prevent dying
under this circumstance.

This series fixes this bug in both git grep and git ls-files as well as
correctly formatting the output from submodules to handle the relative paths
with "..".

One of the changes made to fix this was to add an additional flag for the
parse_pathspec() function in order to treat all paths provided as being from
the root of the repository.  I hesitantly selected the name 'PATHSPEC_FROMROOT'
but I'm not fond of it since its too similar to the pathspec magic define
'PATHSPEC_FROMTOP'.  So I'm open for naming suggestions.

Brandon Williams (5):
  grep: illustrate bug when recursing with relative pathspec
  pathspec: add PATHSPEC_FROMROOT flag
  grep: fix bug when recuring with relative pathspec
  ls-files: illustrate bug when recursing with relative pathspec
  ls-files: fix bug when recuring with relative pathspec

 builtin/grep.c |  8 --
 builtin/ls-files.c |  8 --
 pathspec.c |  2 +-
 pathspec.h |  2 ++
 t/t3007-ls-files-recurse-submodules.sh | 50 ++
 t/t7814-grep-recurse-submodules.sh | 42 
 6 files changed, 107 insertions(+), 5 deletions(-)

-- 
2.11.0.483.g087da7b7c-goog



Re: SHA1 collisions found

2017-02-24 Thread Ian Jackson
Junio C Hamano writes ("Re: SHA1 collisions found"):
> Ian Jackson  writes:
> >  * Therefore the transition needs to be done by giving every object
> >two names (old and new hash function).  Objects may refer to each
> >other by either name, but must pick one.  The usual shape of
> 
> I do not think it is necessrily so.

Indeed.  And my latest thoughts involve instead having two parallel
systems of old and new objects.

> *1* In the above toy example, length being 40 vs 64 is used as a
> sign between SHA-1 and the new hash, and careful readers may
> wonder if we should use sha-3,20769079d22... or something like
> that that more explicity identifies what hash is used, so that
> we can pick a hash whose length is 64 when we transition again.

I have an idea for this.  I think we should prefix new hashes with a
single uppercase letter, probably H.

Uppercase because: case-only-distinguished ref names are already
discouraged because they do not work properly on case-insensitive
filesystems; convention is that ref names are lowercase; so an
uppercase letter probably won't appear at the start of a ref name
component even though almost all existing software will treat it as
legal.  So the result is that the new object names are unlikely to
collide with ref names.

(There is of course no need to store the H as a literal in filenames,
so the case-insensitive filesystem problem does not apply to ref
names.)

We should definitely not introduce new punctuation into object names.
That will cause a great deal of grief for existing software which has
to handle git object names and may thy to store them in
representations which assume that they match \w+.

The idea of using the length is a neat trick, but it cannot support
the dcurrent object name abbreviation approach unworkable.

Ian.


Re: SHA1 collisions found

2017-02-24 Thread Jeff King
On Fri, Feb 24, 2017 at 09:32:13AM -0800, Junio C Hamano wrote:

> >  * Therefore the transition needs to be done by giving every object
> >two names (old and new hash function).  Objects may refer to each
> >other by either name, but must pick one.  The usual shape of
> 
> I do not think it is necessrily so.  Existing code may not be able
> to read anything new, but you can make the new code understand
> object names in both formats, and for a smooth transition, I think
> the new code needs to.
> 
> For example, a new commit that records a merge of an old and a new
> commit whose resulting tree happens to be the same as the tree of
> the old commit may begin like so:
> 
> tree 21b97d4c4f968d1335f16292f954dfdbb91353f0
> parent 20769079d22a9f8010232bdf6131918c33a1bf6910232bdf6131918c33a1bf69
> parent 22af6fef9b6538c9e87e147a920be9509acf1ddd
> 
> naming the only object whose name was done with new hash with the
> new longer hash, while recording the names of the other existing
> objects with SHA-1.  We would need to extend the object format for
> tag (which would be trivial as the object reference is textual and
> similar to a commit) and tree (much harder), of course.

One thing I worry about in a mixed-hash setting is how often the two
will be mixed. That will lead to interoperability complications, but I
also think it creates security hazards (if I can convince you somehow to
refer to my evil colliding file by its sha1, for example, then I can
subvert the strength of the new hash).

So I'd much rather see strong rules like:

  1. Once a repo has flag-day switched over to the new hash format[1],
 new references are _always_ done with the new hash. Even ones that
 point to pre-flag-day objects!

 So you get a "commit-v2" object instead of a "commit", and it has a
 distinct hash identity from its "commit" counterpart. You can point
 to a classic "commit", but you do so by its new-hash.

 The flag-day switch would probably be a repo config flag based on
 repositoryformatversion (so old versions would just punt if they
 see it). Let's call this flag "newhash" for lack of a better term.

  2. Repos that have new-hash set will consider the new hash
 format as primary, and always use it when writing and referring to
 new objects (e.g., in refs). A (purely local) sha1->new mapping can
 be maintained for doing old-style object lookups, or for quick
 equivalence checks (this mapping might need to be bi-directional
 for some use cases; I haven't thought hard enough about it to say
 either way).

  3. For protocol interop, the rules would be something like[2]:

  a. If upload-pack is serving a newhash repo, it advertises
 so in the capabilities.

 Recent clients know that the rest of the conversation will
 involve the new hash format. If they're cloning, they set the
 newhash flag in their local config.  If they're fetching, they
 probably abort and say "please enable newhash" (because for an
 existing repo, it probably needs to migrate refs, for example).

 An old client would fail to send back the newhash capability,
 and the server would abort the conversation at that point.

 A new upload-pack serving a non-newhash repo behaves the same
 as now (use sha1, happily interoperate with existing and new
 clients).

  b. receive-pack is more or less the mirror image.

 A server for a newhash-flagged repo has a capability for "this
 is a newhash repo" and advertises newhash refs. An existing
 client might still try to push, but the server would reject it
 unless it advertises "newhash" back to the server.

 A newhash-enabled client on a non-newhash repo would abort more
 gracefully ("please upgrade your local repo to newhash").

 For a newhash-enabled server with a non-newhash repo, it would
 probably not advertise anything (not even "I understand
 newhash"). Because the process for converting to newhash is not
 "just push some newhash objects", but an out-of-band flag-day
 to convert it over.

That's just a sketch I came up with. There are probably holes. And it
definitely leaves a lot of _possible_ interoperability on the table in
favor of the flag-day approach. But I think the flag-day approach is a
lot easier to reason about. Both in the code, and in terms of the
security properties.

-Peff

[1] I was intentionally vague on "new hash format" here. Obviously there
are various contenders like SHA-256. But I think there's also an
open question of whether the new format should be a multi-hash
format. That would ease further transitions. At the same time, we
really _don't_ want people picking bespoke hashes for their
repositories. It creates complications in the code, and it destroys
a bunch of optimizations (like knowing when we are both 

Re: SHA1 collisions found

2017-02-24 Thread Jakub Narębski
W dniu 25.02.2017 o 00:06, Jeff King pisze:
> On Fri, Feb 24, 2017 at 11:47:46PM +0100, Jakub Narębski wrote:
> 
>> I have just read on ArsTechnica[1] that while Git repository could be
>> corrupted (though this would require attackers to spend great amount
>> of resources creating their own collision, while as said elsewhere
>> in this thread allegedly easy to detect), putting two proof-of-concept
>> different PDFs with same size and SHA-1 actually *breaks* Subversion.
>> Repository can become corrupt, and stop accepting new commits.  
>>
>> From what I understand people tried this, and Git doesn't exhibit
>> such problem.  I wonder what assumptions SVN made that were broken...
> 
> To be clear, nobody has generated a sha1 collision in Git yet, and you
> cannot blindly use the shattered PDFs to do so. Git's notion of the
> SHA-1 of an object include the header, so somebody would have to do a
> shattered-level collision search for something that starts with the
> correct "blob 1234\0" header.

What I meant by "Git doesn't exhibit such problem" (but was not clear
enough) is that Git doesn't break by just adding SHAttered.io PDFs
(which somebody had checked), but need customized attack.

> 
> So we don't actually know how Git would behave in the face of a SHA-1
> collision. It would be pretty easy to simulate it with something like:

You are right that it would be good to know if such Git-geared customized
SHA-1 attack would break Git, or would it simply corrupt it (visibly
or not).

> 
> ---
> diff --git a/block-sha1/sha1.c b/block-sha1/sha1.c
> index 22b125cf8..1be5b5ba3 100644
> --- a/block-sha1/sha1.c
> +++ b/block-sha1/sha1.c
> @@ -231,6 +231,16 @@ void blk_SHA1_Update(blk_SHA_CTX *ctx, const void *data, 
> unsigned long len)
>   memcpy(ctx->W, data, len);
>  }
>  
> +/* sha1 of blobs containing "foo\n" and "bar\n" */
> +static const unsigned char foo_sha1[] = {
> + 0x25, 0x7c, 0xc5, 0x64, 0x2c, 0xb1, 0xa0, 0x54, 0xf0, 0x8c,
> + 0xc8, 0x3f, 0x2d, 0x94, 0x3e, 0x56, 0xfd, 0x3e, 0xbe, 0x99
> +};
> +static const unsigned char bar_sha1[] = {
> + 0x57, 0x16, 0xca, 0x59, 0x87, 0xcb, 0xf9, 0x7d, 0x6b, 0xb5,
> + 0x49, 0x20, 0xbe, 0xa6, 0xad, 0xde, 0x24, 0x2d, 0x87, 0xe6
> +};
> +
>  void blk_SHA1_Final(unsigned char hashout[20], blk_SHA_CTX *ctx)
>  {
>   static const unsigned char pad[64] = { 0x80 };
> @@ -248,4 +258,8 @@ void blk_SHA1_Final(unsigned char hashout[20], 
> blk_SHA_CTX *ctx)
>   /* Output hash */
>   for (i = 0; i < 5; i++)
>   put_be32(hashout + i * 4, ctx->H[i]);
> +
> + /* pretend "foo" and "bar" collide */
> + if (!memcmp(hashout, bar_sha1, 20))
> + memcpy(hashout, foo_sha1, 20);
>  }
> 



Re: [PATCH 3/3] parse_hide_refs_config: tell parse_config_key we don't want a subsection

2017-02-24 Thread Stefan Beller
On Fri, Feb 24, 2017 at 1:08 PM, Jeff King  wrote:
> +   (!parse_config_key(var, section, NULL, NULL, ) &&
> +!strcmp(key, "hiderefs"))) {

Heh, see how fast my code gets replaced with even better code? ;)
All three patches look good,

Thanks,
Stefan


Re: SHA1 collisions found

2017-02-24 Thread Øyvind A . Holm
On 2017-02-25 00:05:34, Jakub Narębski wrote:
> W dniu 24.02.2017 o 23:53, Santiago Torres pisze:
> > On Fri, Feb 24, 2017 at 11:47:46PM +0100, Jakub Narębski wrote:
> > > I have just read on ArsTechnica[1] that while Git repository could 
> > > be corrupted (though this would require attackers to spend great 
> > > amount of resources creating their own collision, while as said 
> > > elsewhere in this thread allegedly easy to detect), putting two 
> > > proof-of-concept different PDFs with same size and SHA-1 actually 
> > > *breaks* Subversion. Repository can become corrupt, and stop 
> > > accepting new commits.
> >
> > From what I understood in the thread[1], it was the combination of 
> > svn + git-svn together. I think Arstechnica may be a little bit 
> > sensationalistic here.
>
> > [1] https://bugs.webkit.org/show_bug.cgi?id=168774#c27
>
> Thanks for the link.  It looks like the problem was with svn itself 
> (couldn't checkout, couldn't sync), but repository is recovered now, 
> though not protected against the problem occurring again.
>
> Well, anyone with Subversion installed (so not me) can check it for 
> himself/herself... though better do this with separate svnroot.

I tested this yesterday by adding the two PDF files to a Subversion 
repository, and found that it wasn't able to clone ("checkout" in svn 
speak) the repository after the two files had been committed. I posted 
the results to the svn-dev mailing list, the thread is at 
.

It seems as it only breaks the working copy because the pristine copies 
are identified with a SHA1 sum, but the FSFS repository backend seems to 
cope with it.

Regards,
Øyvind

+-| Øyvind A. Holm  - N 60.37604° E 5.9° |-+
| OpenPGP: 0xFB0CBEE894A506E5 - http://www.sunbase.org/pubkey.asc |
| Fingerprint: A006 05D6 E676 B319 55E2  E77E FB0C BEE8 94A5 06E5 |
+| 41517b2c-fae7-11e6-9521-db5caa6d21d3 |-+


signature.asc
Description: PGP signature


Re: SHA1 collisions found

2017-02-24 Thread Jeff King
On Fri, Feb 24, 2017 at 11:47:46PM +0100, Jakub Narębski wrote:

> I have just read on ArsTechnica[1] that while Git repository could be
> corrupted (though this would require attackers to spend great amount
> of resources creating their own collision, while as said elsewhere
> in this thread allegedly easy to detect), putting two proof-of-concept
> different PDFs with same size and SHA-1 actually *breaks* Subversion.
> Repository can become corrupt, and stop accepting new commits.  
> 
> From what I understand people tried this, and Git doesn't exhibit
> such problem.  I wonder what assumptions SVN made that were broken...

To be clear, nobody has generated a sha1 collision in Git yet, and you
cannot blindly use the shattered PDFs to do so. Git's notion of the
SHA-1 of an object include the header, so somebody would have to do a
shattered-level collision search for something that starts with the
correct "blob 1234\0" header.

So we don't actually know how Git would behave in the face of a SHA-1
collision. It would be pretty easy to simulate it with something like:

---
diff --git a/block-sha1/sha1.c b/block-sha1/sha1.c
index 22b125cf8..1be5b5ba3 100644
--- a/block-sha1/sha1.c
+++ b/block-sha1/sha1.c
@@ -231,6 +231,16 @@ void blk_SHA1_Update(blk_SHA_CTX *ctx, const void *data, 
unsigned long len)
memcpy(ctx->W, data, len);
 }
 
+/* sha1 of blobs containing "foo\n" and "bar\n" */
+static const unsigned char foo_sha1[] = {
+   0x25, 0x7c, 0xc5, 0x64, 0x2c, 0xb1, 0xa0, 0x54, 0xf0, 0x8c,
+   0xc8, 0x3f, 0x2d, 0x94, 0x3e, 0x56, 0xfd, 0x3e, 0xbe, 0x99
+};
+static const unsigned char bar_sha1[] = {
+   0x57, 0x16, 0xca, 0x59, 0x87, 0xcb, 0xf9, 0x7d, 0x6b, 0xb5,
+   0x49, 0x20, 0xbe, 0xa6, 0xad, 0xde, 0x24, 0x2d, 0x87, 0xe6
+};
+
 void blk_SHA1_Final(unsigned char hashout[20], blk_SHA_CTX *ctx)
 {
static const unsigned char pad[64] = { 0x80 };
@@ -248,4 +258,8 @@ void blk_SHA1_Final(unsigned char hashout[20], blk_SHA_CTX 
*ctx)
/* Output hash */
for (i = 0; i < 5; i++)
put_be32(hashout + i * 4, ctx->H[i]);
+
+   /* pretend "foo" and "bar" collide */
+   if (!memcmp(hashout, bar_sha1, 20))
+   memcpy(hashout, foo_sha1, 20);
 }


Re: SHA1 collisions found

2017-02-24 Thread Jakub Narębski
W dniu 24.02.2017 o 23:53, Santiago Torres pisze:
> On Fri, Feb 24, 2017 at 11:47:46PM +0100, Jakub Narębski wrote:
>>
>> I have just read on ArsTechnica[1] that while Git repository could be
>> corrupted (though this would require attackers to spend great amount
>> of resources creating their own collision, while as said elsewhere
>> in this thread allegedly easy to detect), putting two proof-of-concept
>> different PDFs with same size and SHA-1 actually *breaks* Subversion.
>> Repository can become corrupt, and stop accepting new commits.  
> 
> From what I understood in the thread[1], it was the combination of svn +
> git-svn together. I think Arstechnica may be a little bit
> sensationalistic here.
 
> [1] https://bugs.webkit.org/show_bug.cgi?id=168774#c27

Thanks for the link.  It looks like the problem was with svn itself
(couldn't checkout, couldn't sync), but repository is recovered now,
though not protected against the problem occurring again.

Well, anyone with Subversion installed (so not me) can check it
for himself/herself... though better do this with separate svnroot.


Note that the breakage was an accident, trying to add test case
for SHA-1 collision in WebKit cache.
 
Best regards,
-- 
Jakub Narębski


Re: SHA1 collisions found

2017-02-24 Thread Santiago Torres
On Fri, Feb 24, 2017 at 11:47:46PM +0100, Jakub Narębski wrote:
> I have just read on ArsTechnica[1] that while Git repository could be
> corrupted (though this would require attackers to spend great amount
> of resources creating their own collision, while as said elsewhere
> in this thread allegedly easy to detect), putting two proof-of-concept
> different PDFs with same size and SHA-1 actually *breaks* Subversion.
> Repository can become corrupt, and stop accepting new commits.  

From what I understood in the thread[1], it was the combination of svn +
git-svn together. I think Arstechnica may be a little bit
sensationalistic here.

Cheers!
-Santiago.

[1] https://bugs.webkit.org/show_bug.cgi?id=168774#c27


signature.asc
Description: PGP signature


Re: SHA1 collisions found

2017-02-24 Thread Jakub Narębski
I have just read on ArsTechnica[1] that while Git repository could be
corrupted (though this would require attackers to spend great amount
of resources creating their own collision, while as said elsewhere
in this thread allegedly easy to detect), putting two proof-of-concept
different PDFs with same size and SHA-1 actually *breaks* Subversion.
Repository can become corrupt, and stop accepting new commits.  

>From what I understand people tried this, and Git doesn't exhibit
such problem.  I wonder what assumptions SVN made that were broken...

The https://shattered.io/ page updated their Q section with this
information.

BTW. what's with that page use of "GIT" instead of "Git"??


[1]: 
https://arstechnica.com/security/2017/02/watershed-sha1-collision-just-broke-the-webkit-repository-others-may-follow/
 
 "Watershed SHA1 collision just broke the WebKit repository, others may 
follow"


Re: [PATCH v6 1/1] config: add conditional include

2017-02-24 Thread Philip Oakley

From: "Nguyễn Thái Ngọc Duy" 

Sometimes a set of repositories want to share configuration settings
among themselves that are distinct from other such sets of repositories.
A user may work on two projects, each of which have multiple
repositories, and use one user.email for one project while using another
for the other.

Setting $GIT_DIR/.config works, but if the penalty of forgetting to
update $GIT_DIR/.config is high (especially when you end up cloning
often), it may not be the best way to go. Having the settings in
~/.gitconfig, which would work for just one set of repositories, would
not well in such a situation. Having separate ${HOME}s may add more
problems than it solves.

Extend the include.path mechanism that lets a config file include
another config file, so that the inclusion can be done only when some
conditions hold. Then ~/.gitconfig can say "include config-project-A
only when working on project-A" for each project A the user works on.

In this patch, the only supported grouping is based on $GIT_DIR (in
absolute path), so you would need to group repositories by directory, or
something like that to take advantage of it.

We already have include.path for unconditional includes. This patch goes
with includeIf..path to make it clearer that a condition is
required. The new config has the same backward compatibility approach as
include.path: older git versions that don't understand includeIf will
simply ignore them.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
Documentation/config.txt  | 61 +
config.c  | 97 
+++

t/t1305-config-include.sh | 56 +++
3 files changed, 214 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 015346c417..6c0cd2a273 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -91,6 +91,56 @@ found at the location of the include directive. If the 
value of the

relative to the configuration file in which the include directive was
found.  See below for examples.

+Conditional includes
+
+
+You can include one config file from another conditionally by setting


On first reading I thought this implied you can only have one `includeIf` 
within the config file.
I think it is meant to mean that each `includeIf`could include one other 
file, and that users can have multiple `includeIf` lines.




+a `includeIf..path` variable to the name of the file to be
+included. The variable's value is treated the same way as `include.path`.
+


--
Philip 



Re: [PATCH v2 2/2] Documentation: Link descriptions of -z to core.quotePath

2017-02-24 Thread Jakub Narębski
W dniu 24.02.2017 o 21:37, Andreas Heiduk pisze:
> Linking the description for pathname quoting to the configuration
> variable "core.quotePath" removes inconstistent and incomplete
> sections while also giving two hints how to deal with it: Either with
> "-c core.quotePath=false" or with "-z".

This patch I am not sure about.  On one hand it improves consistency
(and makes information more complete), on the other hand it removes
information at hand and instead refers to other manpage.

Perhaps a better solution would be to craft a short description that
is both sufficiently complete, and refers to "core.quotePath" for
more details, and then transclude it with "include::quotepath.txt[]".

> 
> Signed-off-by: Andreas Heiduk 
> ---
>  Documentation/diff-format.txt |  7 ---
>  Documentation/diff-generate-patch.txt |  7 +++
>  Documentation/diff-options.txt|  7 +++
>  Documentation/git-apply.txt   |  7 +++
>  Documentation/git-commit.txt  |  9 ++---
>  Documentation/git-ls-files.txt| 10 ++
>  Documentation/git-ls-tree.txt | 10 +++---
>  Documentation/git-status.txt  |  7 +++
>  8 files changed, 35 insertions(+), 29 deletions(-)
> 
> diff --git a/Documentation/diff-format.txt b/Documentation/diff-format.txt
> index cf52626..706916c 100644
> --- a/Documentation/diff-format.txt
> +++ b/Documentation/diff-format.txt
> @@ -78,9 +78,10 @@ Example:
>  :100644 100644 5be4a4.. 00.. M file.c
>  
>  
> -When `-z` option is not used, TAB, LF, and backslash characters
> -in pathnames are represented as `\t`, `\n`, and `\\`,
> -respectively.
> +Without the `-z` option, pathnames with "unusual" characters are
> +quoted as explained for the configuration variable `core.quotePath`
> +(see linkgit:git-config[1]).  Using `-z` the filename is output
> +verbatim and the line is terminated by a NUL byte.
>  
>  diff format for merges
>  --
> diff --git a/Documentation/diff-generate-patch.txt 
> b/Documentation/diff-generate-patch.txt
> index d2a7ff5..231105c 100644
> --- a/Documentation/diff-generate-patch.txt
> +++ b/Documentation/diff-generate-patch.txt
> @@ -53,10 +53,9 @@ The index line includes the SHA-1 checksum before and 
> after the change.
>  The  is included if the file mode does not change; otherwise,
>  separate lines indicate the old and the new mode.
>  
> -3.  TAB, LF, double quote and backslash characters in pathnames
> -are represented as `\t`, `\n`, `\"` and `\\`, respectively.
> -If there is need for such substitution then the whole
> -pathname is put in double quotes.
> +3.  Pathnames with "unusual" characters are quoted as explained for
> +the configuration variable `core.quotePath` (see
> +linkgit:git-config[1]).
>  
>  4.  All the `file1` files in the output refer to files before the
>  commit, and all the `file2` files refer to files after the commit.
> diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
> index e6215c3..7c28e73 100644
> --- a/Documentation/diff-options.txt
> +++ b/Documentation/diff-options.txt
> @@ -192,10 +192,9 @@ ifndef::git-log[]
>   given, do not munge pathnames and use NULs as output field terminators.
>  endif::git-log[]
>  +
> -Without this option, each pathname output will have TAB, LF, double quotes,
> -and backslash characters replaced with `\t`, `\n`, `\"`, and `\\`,
> -respectively, and the pathname will be enclosed in double quotes if
> -any of those replacements occurred.
> +Without this option, pathnames with "unusual" characters are munged as
> +explained for the configuration variable `core.quotePath` (see
> +linkgit:git-config[1]).
>  
>  --name-only::
>   Show only names of changed files.
> diff --git a/Documentation/git-apply.txt b/Documentation/git-apply.txt
> index 8ddb207..a7a001b 100644
> --- a/Documentation/git-apply.txt
> +++ b/Documentation/git-apply.txt
> @@ -108,10 +108,9 @@ the information is read from the current index instead.
>   When `--numstat` has been given, do not munge pathnames,
>   but use a NUL-terminated machine-readable format.
>  +
> -Without this option, each pathname output will have TAB, LF, double quotes,
> -and backslash characters replaced with `\t`, `\n`, `\"`, and `\\`,
> -respectively, and the pathname will be enclosed in double quotes if
> -any of those replacements occurred.
> +Without this option, pathnames with "unusual" characters are munged as
> +explained for the configuration variable `core.quotePath` (see
> +linkgit:git-config[1]).
>  
>  -p::
>   Remove  leading slashes from traditional diff paths. The
> diff --git a/Documentation/git-commit.txt b/Documentation/git-commit.txt
> index 4f8f20a..25dcdcc 100644
> --- a/Documentation/git-commit.txt
> +++ b/Documentation/git-commit.txt
> @@ -117,9 +117,12 @@ OPTIONS
>  
>  -z::
>  --null::
> - When showing `short` or 

Re: [PATCH] mingw: use OpenSSL's SHA-1 routines

2017-02-24 Thread Junio C Hamano
Johannes Sixt  writes:

> It can be argued that in normal interactive use, it is hard to notice
> that another DLL is loaded. Don't forget, though, that on Windows it
> is not only the pure time to resolve the entry points, but also that
> typically virus scanners inspect every executable file that is loaded,
> which adds another share of time.
>
> I'll use the patch for daily work for a while to see whether it hurts.

Please ping this thread again when you have something to add.  For
now, I'll demote this patch from 'next' to 'pu' when we rewind and
rebuild 'next' post 2.12 release.

Thanks.


Re: [PATCH v2 1/2] Documentation: Improve description for core.quotePath

2017-02-24 Thread Jakub Narębski
W dniu 24.02.2017 o 21:37, Andreas Heiduk pisze:
> Signed-off-by: Andreas Heiduk 

Thanks.  This is good work.

> ---
>  Documentation/config.txt | 24 ++--
>  1 file changed, 14 insertions(+), 10 deletions(-)
> 
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 1fee83c..fa06c2a 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -347,16 +347,20 @@ core.checkStat::
>   all fields, including the sub-second part of mtime and ctime.
>  
>  core.quotePath::
> - The commands that output paths (e.g. 'ls-files',
> - 'diff'), when not given the `-z` option, will quote
> - "unusual" characters in the pathname by enclosing the
> - pathname in a double-quote pair and with backslashes the
> - same way strings in C source code are quoted.  If this
> - variable is set to false, the bytes higher than 0x80 are
> - not quoted but output as verbatim.  Note that double
> - quote, backslash and control characters are always
> - quoted without `-z` regardless of the setting of this
> - variable.
> +

This empty line should not be here, I think.

> + Commands that output paths (e.g. 'ls-files', 'diff'), will
> + quote "unusual" characters in the pathname by enclosing the
> + pathname in double-quotes and escaping those characters with
> + backslashes in the same way C escapes control characters (e.g.
> + `\t` for TAB, `\n` for LF, `\\` for backslash) or bytes with
> + values larger than 0x80 (e.g. octal `\302\265` for "micro" in

I wonder if we can put UTF-8 in AsciiDoc, that is write "μ"
instead of spelling it "micro" (or: Greek letter "mu").

Or "" / "", though I wonder how well it is supported
in manpage, info and PDF outputs...

> + UTF-8).  If this variable is set to false, bytes higher than
> + 0x80 are not considered "unusual" any more. Double-quotes,
> + backslash and control characters are always escaped regardless
> + of the setting of this variable.  A simple space character is
> + not considered "unusual".  Many commands can output pathnames
> + completely verbatim using the `-z` option. The default value
> + is true.
>  
>  core.eol::
>   Sets the line ending type to use in the working directory for
> 



Re: [PATCH 2/3] parse_config_key: allow matching single-level config

2017-02-24 Thread Jeff King
On Fri, Feb 24, 2017 at 01:20:48PM -0800, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > The parse_config_key() function was introduced to make it
> > easier to match "section.subsection.key" variables. It also
> > handles the simpler "section.key", and the caller is
> > responsible for distinguishing the two from its
> > out-parameters.
> >
> > Most callers who _only_ want "section.key" would just use a
> > strcmp(var, "section.key"), since there is no parsing
> > required. However, they may still use parse_config_key() if
> > their "section" variable isn't a constant (an example of
> > this is in parse_hide_refs_config).
> 
> Perhaps "only" at the end of the title?

Yeah, that would be an improvement.

> After grepping for call sites of this function, I think we can
> simplify quite a few instances of:
> 
>   if (parse_config_key(...) || !name)
>   return ...;

I think you figured this out from your other response, but no, those are
the opposite case (it tricked me at first, too).

-Peff


Re: [PATCH 2/3] parse_config_key: allow matching single-level config

2017-02-24 Thread Junio C Hamano
Jeff King  writes:

> The parse_config_key() function was introduced to make it
> easier to match "section.subsection.key" variables. It also
> handles the simpler "section.key", and the caller is
> responsible for distinguishing the two from its
> out-parameters.
>
> Most callers who _only_ want "section.key" would just use a
> strcmp(var, "section.key"), since there is no parsing
> required. However, they may still use parse_config_key() if
> their "section" variable isn't a constant (an example of
> this is in parse_hide_refs_config).

Perhaps "only" at the end of the title?

After grepping for call sites of this function, I think we can
simplify quite a few instances of:

if (parse_config_key(...) || !name)
return ...;



Re: [PATCH] refs: parse_hide_refs_config to use parse_config_key

2017-02-24 Thread Jeff King
On Fri, Feb 24, 2017 at 01:18:48PM -0800, Junio C Hamano wrote:

> > While I'm thinking about it, here are patches to do that. The third one
> > I'd probably squash into yours (after ordering it to the end).
> >
> >   [1/3]: parse_config_key: use skip_prefix instead of starts_with
> >   [2/3]: parse_config_key: allow matching single-level config
> >   [3/3]: parse_hide_refs_config: tell parse_config_key we don't want a 
> > subsection
> 
> While you were doing that, I was grepping the call sites for
> parse_config_key() and made sure that all of them are OK when fed
> two level names.  Most of them follow this pattern:
> 
>   if (parse_config_key(k, "diff", , , ) || !name)
>   return -1;
> 
> and ones that do not immediately check !name does either eventually
> do so or have separate codepaths for handlihng two- and three-level
> names.

Yeah, I did that, too. :)

I don't think there are any other sites to convert. And I don't think we
can make the "!name" case easier (because some call-sites do want to
handle both types). And it's not like it gets much easier anyway (unlike
the opposite case, you _do_ need to declare the extra variables.

-Peff


Re: [PATCH 2/3] parse_config_key: allow matching single-level config

2017-02-24 Thread Junio C Hamano
Junio C Hamano  writes:

> Jeff King  writes:
>
>> The parse_config_key() function was introduced to make it
>> easier to match "section.subsection.key" variables. It also
>> handles the simpler "section.key", and the caller is
>> responsible for distinguishing the two from its
>> out-parameters.
>>
>> Most callers who _only_ want "section.key" would just use a
>> strcmp(var, "section.key"), since there is no parsing
>> required. However, they may still use parse_config_key() if
>> their "section" variable isn't a constant (an example of
>> this is in parse_hide_refs_config).
>
> Perhaps "only" at the end of the title?

which still stands.  My initial reaction was "eh, I didn't know it
was an error to call the function for two-level names".

> After grepping for call sites of this function, I think we can
> simplify quite a few instances of:
>
>   if (parse_config_key(...) || !name)
>   return ...;

This is false.  Sorry for the noise.


Re: [PATCH] refs: parse_hide_refs_config to use parse_config_key

2017-02-24 Thread Junio C Hamano
Jeff King  writes:

> On Fri, Feb 24, 2017 at 01:18:48PM -0800, Junio C Hamano wrote:
>
>> > While I'm thinking about it, here are patches to do that. The third one
>> > I'd probably squash into yours (after ordering it to the end).
>> >
>> >   [1/3]: parse_config_key: use skip_prefix instead of starts_with
>> >   [2/3]: parse_config_key: allow matching single-level config
>> >   [3/3]: parse_hide_refs_config: tell parse_config_key we don't want a 
>> > subsection
>> 
>> While you were doing that, I was grepping the call sites for
>> parse_config_key() and made sure that all of them are OK when fed
>> two level names.  Most of them follow this pattern:
>> 
>>  if (parse_config_key(k, "diff", , , ) || !name)
>>  return -1;
>> 
>> and ones that do not immediately check !name does either eventually
>> do so or have separate codepaths for handlihng two- and three-level
>> names.
>
> Yeah, I did that, too. :)
>
> I don't think there are any other sites to convert. And I don't think we
> can make the "!name" case easier (because some call-sites do want to
> handle both types). And it's not like it gets much easier anyway (unlike
> the opposite case, you _do_ need to declare the extra variables.

Yeah, because the rejection of !name as an error is not the only
reason these call sites have  and  want to use
that subsection name after that if() statement rejects malformed
input, so we cannot really lose them.

Thanks.  All three looked good.


Re: [PATCH] refs: parse_hide_refs_config to use parse_config_key

2017-02-24 Thread Junio C Hamano
Jeff King  writes:

> On Fri, Feb 24, 2017 at 03:39:40PM -0500, Jeff King wrote:
>
>> This will start parsing "receive.foobar.hiderefs", which we don't want.
>> I think you need:
>> 
>>   !parse_config_key(var, section, , _len, ) &&
>>   !subsection &&
>>   !strcmp(key, "hiderefs")
>> 
>> Perhaps passing NULL for the subsection variable should cause
>> parse_config_key to return failure when there is a non-empty subsection.
>> 
>> -Peff
>> 
>> PS Outside of parse_config_key, this code would be nicer if it used
>>skip_prefix() instead of starts_with(). Since it's going away, I
>>don't think it matters, but I note that parse_config_key could
>>probably benefit from the same.
>
> While I'm thinking about it, here are patches to do that. The third one
> I'd probably squash into yours (after ordering it to the end).
>
>   [1/3]: parse_config_key: use skip_prefix instead of starts_with
>   [2/3]: parse_config_key: allow matching single-level config
>   [3/3]: parse_hide_refs_config: tell parse_config_key we don't want a 
> subsection

While you were doing that, I was grepping the call sites for
parse_config_key() and made sure that all of them are OK when fed
two level names.  Most of them follow this pattern:

if (parse_config_key(k, "diff", , , ) || !name)
return -1;

and ones that do not immediately check !name does either eventually
do so or have separate codepaths for handlihng two- and three-level
names.


[PATCH 3/3] parse_hide_refs_config: tell parse_config_key we don't want a subsection

2017-02-24 Thread Jeff King
This lets us avoid declaring some otherwise useless
variables.

Signed-off-by: Jeff King 
---
 refs.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/refs.c b/refs.c
index 21bc8c910..b9188908b 100644
--- a/refs.c
+++ b/refs.c
@@ -1034,11 +1034,10 @@ static struct string_list *hide_refs;
 
 int parse_hide_refs_config(const char *var, const char *value, const char 
*section)
 {
-   const char *subsection, *key;
-   int subsection_len;
+   const char *key;
if (!strcmp("transfer.hiderefs", var) ||
-   (!parse_config_key(var, section, , _len, )
-   && !subsection && !strcmp(key, "hiderefs"))) {
+   (!parse_config_key(var, section, NULL, NULL, ) &&
+!strcmp(key, "hiderefs"))) {
char *ref;
int len;
 
-- 
2.12.0.616.g5f622f3b1


[PATCH 2/3] parse_config_key: allow matching single-level config

2017-02-24 Thread Jeff King
The parse_config_key() function was introduced to make it
easier to match "section.subsection.key" variables. It also
handles the simpler "section.key", and the caller is
responsible for distinguishing the two from its
out-parameters.

Most callers who _only_ want "section.key" would just use a
strcmp(var, "section.key"), since there is no parsing
required. However, they may still use parse_config_key() if
their "section" variable isn't a constant (an example of
this is in parse_hide_refs_config).

Using the parse_config_key is a bit clunky, though:

  const char *subsection;
  int subsection_len;
  const char *key;

  if (!parse_config_key(var, section, , _len, ) &&
  !subsection) {
  /* matched! */
  }

Instead, let's treat a NULL subsection as an indication that
the caller does not expect one. That lets us write:

  const char *key;

  if (!parse_config_key(var, section, NULL, NULL, )) {
  /* matched! */
  }

Existing callers should be unaffected, as passing a NULL
subsection would currently segfault.

Signed-off-by: Jeff King 
---
 cache.h  | 5 -
 config.c | 8 ++--
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/cache.h b/cache.h
index 61fc86e6d..647a78f3f 100644
--- a/cache.h
+++ b/cache.h
@@ -1819,8 +1819,11 @@ extern int git_config_include(const char *name, const 
char *value, void *data);
  *
  * (i.e., what gets handed to a config_fn_t). The caller provides the section;
  * we return -1 if it does not match, 0 otherwise. The subsection and key
- * out-parameters are filled by the function (and subsection is NULL if it is
+ * out-parameters are filled by the function (and *subsection is NULL if it is
  * missing).
+ *
+ * If the subsection pointer-to-pointer passed in is NULL, returns 0 only if
+ * there is no subsection at all.
  */
 extern int parse_config_key(const char *var,
const char *section,
diff --git a/config.c b/config.c
index 1b08a75a7..13c8b21ea 100644
--- a/config.c
+++ b/config.c
@@ -2552,10 +2552,14 @@ int parse_config_key(const char *var,
 
/* Did we have a subsection at all? */
if (dot == var) {
-   *subsection = NULL;
-   *subsection_len = 0;
+   if (subsection) {
+   *subsection = NULL;
+   *subsection_len = 0;
+   }
}
else {
+   if (!subsection)
+   return -1;
*subsection = var + 1;
*subsection_len = dot - *subsection;
}
-- 
2.12.0.616.g5f622f3b1



[PATCH 1/3] parse_config_key: use skip_prefix instead of starts_with

2017-02-24 Thread Jeff King
This saves us having to repeatedly add in "section_len" (and
also avoids walking over the first part of the string
multiple times for a strlen() and strrchr()).

Signed-off-by: Jeff King 
---
 config.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/config.c b/config.c
index c6b874a7b..1b08a75a7 100644
--- a/config.c
+++ b/config.c
@@ -2536,11 +2536,10 @@ int parse_config_key(const char *var,
 const char **subsection, int *subsection_len,
 const char **key)
 {
-   int section_len = strlen(section);
const char *dot;
 
/* Does it start with "section." ? */
-   if (!starts_with(var, section) || var[section_len] != '.')
+   if (!skip_prefix(var, section, ) || *var != '.')
return -1;
 
/*
@@ -2552,12 +2551,12 @@ int parse_config_key(const char *var,
*key = dot + 1;
 
/* Did we have a subsection at all? */
-   if (dot == var + section_len) {
+   if (dot == var) {
*subsection = NULL;
*subsection_len = 0;
}
else {
-   *subsection = var + section_len + 1;
+   *subsection = var + 1;
*subsection_len = dot - *subsection;
}
 
-- 
2.12.0.616.g5f622f3b1



Re: [PATCH] refs: parse_hide_refs_config to use parse_config_key

2017-02-24 Thread Jeff King
On Fri, Feb 24, 2017 at 03:39:40PM -0500, Jeff King wrote:

> This will start parsing "receive.foobar.hiderefs", which we don't want.
> I think you need:
> 
>   !parse_config_key(var, section, , _len, ) &&
>   !subsection &&
>   !strcmp(key, "hiderefs")
> 
> Perhaps passing NULL for the subsection variable should cause
> parse_config_key to return failure when there is a non-empty subsection.
> 
> -Peff
> 
> PS Outside of parse_config_key, this code would be nicer if it used
>skip_prefix() instead of starts_with(). Since it's going away, I
>don't think it matters, but I note that parse_config_key could
>probably benefit from the same.

While I'm thinking about it, here are patches to do that. The third one
I'd probably squash into yours (after ordering it to the end).

  [1/3]: parse_config_key: use skip_prefix instead of starts_with
  [2/3]: parse_config_key: allow matching single-level config
  [3/3]: parse_hide_refs_config: tell parse_config_key we don't want a 
subsection

 cache.h  |  5 -
 config.c | 15 +--
 refs.c   |  7 +++
 3 files changed, 16 insertions(+), 11 deletions(-)



Re: [PATCH 01/10] submodule: decouple url and submodule existence

2017-02-24 Thread Junio C Hamano
Brandon Williams  writes:

> Currently the submodule..url config option is used to determine
> if a given submodule exists and is interesting to the user.  This
> however doesn't work very well because the URL is a config option for
> the scope of a repository, whereas the existence of a submodule is an
> option scoped to the working tree.
>
> In a future with worktree support for submodules, there will be multiple
> working trees, each of which may only need a subset of the submodules
> checked out.  The URL (which is where the submodule repository can be
> obtained) should not differ between different working trees.
>
> It may also be convenient for users to more easily specify groups of
> submodules they are interested in as apposed to running "git submodule
> init " on each submodule they want checked out in their working
> tree.
>
> To this end, the config option submodule.active is introduced which
> holds a pathspec that specifies which submodules should exist in the
> working tree.

Hmph.  submodule.active in .git/config would be shared the same way
submodule..url in .git/config is shared across the worktrees
that stems from the same primary repository, no?

Perhaps there are some other uses of this submodule.active idea, but
I do not see how it is relevant to solving "multiple worktrees"
issue.  Per-worktree config would solve it with the current
submodule..url without submodule.active list, I would think [*1*].

Also as a grouping measure, submodule.active that lists submodule
paths feels hard to use.  When switching between two branches in the
superproject that have the same submodule bound at two different
paths, who is responsible for updating submodule.active in
superproject's config?  If it were a list of submodule names, this
objection does not apply, though.



[Footnote]

*1* At the conceptual level, I agree that .url that also means "we
are interested in this one" feels like somewhat an unclean
design, but that is not what you are "fixing", is it?



Re: [PATCH] refs: parse_hide_refs_config to use parse_config_key

2017-02-24 Thread Junio C Hamano
Stefan Beller  writes:

> parse_config_key was introduced in 1b86bbb0ade (config: add helper
> function for parsing key names, 2013-01-22), the NEEDSWORK that is removed
> in this patch was introduced at daebaa7813 (upload/receive-pack: allow
> hiding ref hierarchies, 2013-01-18), which is only a couple days apart,
> so presumably the code replaced in this patch was only introduced due
> to not wanting to wait on the proper helper function being available.
>
> Make the condition easier to read by using parse_config_key.
>
> Signed-off-by: Stefan Beller 
> ---
>  refs.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/refs.c b/refs.c
> index cd36b64ed9..21bc8c9101 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -1034,10 +1034,11 @@ static struct string_list *hide_refs;
>  
>  int parse_hide_refs_config(const char *var, const char *value, const char 
> *section)
>  {
> + const char *subsection, *key;
> + int subsection_len;
>   if (!strcmp("transfer.hiderefs", var) ||
> - /* NEEDSWORK: use parse_config_key() once both are merged */
> - (starts_with(var, section) && var[strlen(section)] == '.' &&
> -  !strcmp(var + strlen(section), ".hiderefs"))) {
> + (!parse_config_key(var, section, , _len, )
> + && !subsection && !strcmp(key, "hiderefs"))) {
>   char *ref;
>   int len;

Thanks for noticing.  "once both are merged" is a cryptic comment to
leave when somebody knows which two topics make "both" ;-)


Re: [PATCH] refs: parse_hide_refs_config to use parse_config_key

2017-02-24 Thread Jeff King
On Fri, Feb 24, 2017 at 12:43:35PM -0800, Stefan Beller wrote:

> parse_config_key was introduced in 1b86bbb0ade (config: add helper
> function for parsing key names, 2013-01-22), the NEEDSWORK that is removed
> in this patch was introduced at daebaa7813 (upload/receive-pack: allow
> hiding ref hierarchies, 2013-01-18), which is only a couple days apart,
> so presumably the code replaced in this patch was only introduced due
> to not wanting to wait on the proper helper function being available.
> 
> Make the condition easier to read by using parse_config_key.
> [...]
>   if (!strcmp("transfer.hiderefs", var) ||
> - /* NEEDSWORK: use parse_config_key() once both are merged */
> - (starts_with(var, section) && var[strlen(section)] == '.' &&
> -  !strcmp(var + strlen(section), ".hiderefs"))) {
> + (!parse_config_key(var, section, , _len, )
> + && !subsection && !strcmp(key, "hiderefs"))) {

Yeah, this one looks fine.

-Peff


hi

2017-02-24 Thread abudu samfo
Hello Dear Friend,

 My name is Samfo Abudu I have decided to seek a confidential
co-operation  with you in the execution of the deal described
here-under for our both  mutual benefit and I hope you will keep it a
top secret because of the nature  of the transaction, During the
course of our bank year auditing, I discovered  an unclaimed/abandoned
fund, sum total of {US$19.3 Million United State  Dollars} in the bank
account that belongs to a Saudi Arabia businessman Who unfortunately
lost his life and entire family in a Motor Accident.

 Now our bank has been waiting for any of the relatives to come-up for
the claim but nobody has done that. I personally has been unsuccessful
in locating any of the relatives, now, I sincerely seek your consent
to present you as the next of kin / Will Beneficiary to the deceased
so that the proceeds of this account valued at {US$19.3 Million United
State Dollars} can be paid to you, which we will share in these
percentages ratio, 60% to me and 40% to you. All I request is your
utmost sincere co-operation; trust and maximum confidentiality to
achieve this project successfully. I have carefully mapped out the
moralities for execution of this transaction under a legitimate
arrangement to protect you from any breach of the law both in your
country and here in Burkina Faso when the fund is being transferred to
your bank account.

 I will have to provide all the relevant document that will be
requested to indicate that you are the rightful beneficiary of this
legacy and our bank will release the fund to you without any further
delay, upon your consideration and acceptance of this offer, please
send me the following information as stated below so we can proceed
and get this fund transferred to your designated bank account
immediately.

Your Full Name:
Your Contact Address:
Your direct Mobile telephone Number:
Your Date of Birth:
Your occupation:

 I await your swift response

 Samfo Abudu
 Best regards,
so we commence this transaction immediately.


[PATCH] refs: parse_hide_refs_config to use parse_config_key

2017-02-24 Thread Stefan Beller
parse_config_key was introduced in 1b86bbb0ade (config: add helper
function for parsing key names, 2013-01-22), the NEEDSWORK that is removed
in this patch was introduced at daebaa7813 (upload/receive-pack: allow
hiding ref hierarchies, 2013-01-18), which is only a couple days apart,
so presumably the code replaced in this patch was only introduced due
to not wanting to wait on the proper helper function being available.

Make the condition easier to read by using parse_config_key.

Signed-off-by: Stefan Beller 
---
 refs.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/refs.c b/refs.c
index cd36b64ed9..21bc8c9101 100644
--- a/refs.c
+++ b/refs.c
@@ -1034,10 +1034,11 @@ static struct string_list *hide_refs;
 
 int parse_hide_refs_config(const char *var, const char *value, const char 
*section)
 {
+   const char *subsection, *key;
+   int subsection_len;
if (!strcmp("transfer.hiderefs", var) ||
-   /* NEEDSWORK: use parse_config_key() once both are merged */
-   (starts_with(var, section) && var[strlen(section)] == '.' &&
-!strcmp(var + strlen(section), ".hiderefs"))) {
+   (!parse_config_key(var, section, , _len, )
+   && !subsection && !strcmp(key, "hiderefs"))) {
char *ref;
int len;
 
-- 
2.12.0.rc1.16.ge4278d41a0.dirty



[PATCH] refs: parse_hide_refs_config to use parse_config_key

2017-02-24 Thread Stefan Beller
parse_config_key was introduced in 1b86bbb0ade (config: add helper
function for parsing key names, 2013-01-22), the NEEDSWORK that is removed
in this patch was introduced at daebaa7813 (upload/receive-pack: allow
hiding ref hierarchies, 2013-01-18), which is only a couple days apart,
so presumably the code replaced in this patch was only introduced due
to not wanting to wait on the proper helper function being available.

Make the condition easier to read by using parse_config_key.

Signed-off-by: Stefan Beller 
---

  When investigating the state of the art for parsing config options, I saw
  opportunity for a small drive-by patch in an area that I did not look at for 
  a long time.
  
  The authors of the two mentioned commits are Jeff and Junio, so maybe you
  remember another reason for this NEEDSWORK here?
  
  Thanks,
  Stefan
  
 refs.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/refs.c b/refs.c
index cd36b64ed9..c9e5f13630 100644
--- a/refs.c
+++ b/refs.c
@@ -1034,10 +1034,11 @@ static struct string_list *hide_refs;
 
 int parse_hide_refs_config(const char *var, const char *value, const char 
*section)
 {
+   const char *subsection, *key;
+   int subsection_len;
if (!strcmp("transfer.hiderefs", var) ||
-   /* NEEDSWORK: use parse_config_key() once both are merged */
-   (starts_with(var, section) && var[strlen(section)] == '.' &&
-!strcmp(var + strlen(section), ".hiderefs"))) {
+   (!parse_config_key(var, section, , _len, )
+   && !strcmp(key, "hiderefs"))) {
char *ref;
int len;
 
-- 
2.12.0.rc1.16.ge4278d41a0.dirty



Re: [PATCH] refs: parse_hide_refs_config to use parse_config_key

2017-02-24 Thread Jeff King
On Fri, Feb 24, 2017 at 12:33:35PM -0800, Stefan Beller wrote:

> parse_config_key was introduced in 1b86bbb0ade (config: add helper
> function for parsing key names, 2013-01-22), the NEEDSWORK that is removed
> in this patch was introduced at daebaa7813 (upload/receive-pack: allow
> hiding ref hierarchies, 2013-01-18), which is only a couple days apart,
> so presumably the code replaced in this patch was only introduced due
> to not wanting to wait on the proper helper function being available.
> 
> Make the condition easier to read by using parse_config_key.
> 
> Signed-off-by: Stefan Beller 
> ---
> 
>   When investigating the state of the art for parsing config options, I saw
>   opportunity for a small drive-by patch in an area that I did not look at 
> for 
>   a long time.
>   
>   The authors of the two mentioned commits are Jeff and Junio, so maybe you
>   remember another reason for this NEEDSWORK here?

No, I think the reasoning you gave in the commit message is exactly
what happened.

> diff --git a/refs.c b/refs.c
> index cd36b64ed9..c9e5f13630 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -1034,10 +1034,11 @@ static struct string_list *hide_refs;
>  
>  int parse_hide_refs_config(const char *var, const char *value, const char 
> *section)
>  {
> + const char *subsection, *key;
> + int subsection_len;
>   if (!strcmp("transfer.hiderefs", var) ||
> - /* NEEDSWORK: use parse_config_key() once both are merged */
> - (starts_with(var, section) && var[strlen(section)] == '.' &&
> -  !strcmp(var + strlen(section), ".hiderefs"))) {
> + (!parse_config_key(var, section, , _len, )
> + && !strcmp(key, "hiderefs"))) {

This will start parsing "receive.foobar.hiderefs", which we don't want.
I think you need:

  !parse_config_key(var, section, , _len, ) &&
  !subsection &&
  !strcmp(key, "hiderefs")

Perhaps passing NULL for the subsection variable should cause
parse_config_key to return failure when there is a non-empty subsection.

-Peff

PS Outside of parse_config_key, this code would be nicer if it used
   skip_prefix() instead of starts_with(). Since it's going away, I
   don't think it matters, but I note that parse_config_key could
   probably benefit from the same.


[PATCH v2 1/2] Documentation: Improve description for core.quotePath

2017-02-24 Thread Andreas Heiduk
Signed-off-by: Andreas Heiduk 
---
 Documentation/config.txt | 24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 1fee83c..fa06c2a 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -347,16 +347,20 @@ core.checkStat::
all fields, including the sub-second part of mtime and ctime.
 
 core.quotePath::
-   The commands that output paths (e.g. 'ls-files',
-   'diff'), when not given the `-z` option, will quote
-   "unusual" characters in the pathname by enclosing the
-   pathname in a double-quote pair and with backslashes the
-   same way strings in C source code are quoted.  If this
-   variable is set to false, the bytes higher than 0x80 are
-   not quoted but output as verbatim.  Note that double
-   quote, backslash and control characters are always
-   quoted without `-z` regardless of the setting of this
-   variable.
+
+   Commands that output paths (e.g. 'ls-files', 'diff'), will
+   quote "unusual" characters in the pathname by enclosing the
+   pathname in double-quotes and escaping those characters with
+   backslashes in the same way C escapes control characters (e.g.
+   `\t` for TAB, `\n` for LF, `\\` for backslash) or bytes with
+   values larger than 0x80 (e.g. octal `\302\265` for "micro" in
+   UTF-8).  If this variable is set to false, bytes higher than
+   0x80 are not considered "unusual" any more. Double-quotes,
+   backslash and control characters are always escaped regardless
+   of the setting of this variable.  A simple space character is
+   not considered "unusual".  Many commands can output pathnames
+   completely verbatim using the `-z` option. The default value
+   is true.
 
 core.eol::
Sets the line ending type to use in the working directory for


[PATCH v2 2/2] Documentation: Link descriptions of -z to core.quotePath

2017-02-24 Thread Andreas Heiduk
Linking the description for pathname quoting to the configuration
variable "core.quotePath" removes inconstistent and incomplete
sections while also giving two hints how to deal with it: Either with
"-c core.quotePath=false" or with "-z".

Signed-off-by: Andreas Heiduk 
---
 Documentation/diff-format.txt |  7 ---
 Documentation/diff-generate-patch.txt |  7 +++
 Documentation/diff-options.txt|  7 +++
 Documentation/git-apply.txt   |  7 +++
 Documentation/git-commit.txt  |  9 ++---
 Documentation/git-ls-files.txt| 10 ++
 Documentation/git-ls-tree.txt | 10 +++---
 Documentation/git-status.txt  |  7 +++
 8 files changed, 35 insertions(+), 29 deletions(-)

diff --git a/Documentation/diff-format.txt b/Documentation/diff-format.txt
index cf52626..706916c 100644
--- a/Documentation/diff-format.txt
+++ b/Documentation/diff-format.txt
@@ -78,9 +78,10 @@ Example:
 :100644 100644 5be4a4.. 00.. M file.c
 
 
-When `-z` option is not used, TAB, LF, and backslash characters
-in pathnames are represented as `\t`, `\n`, and `\\`,
-respectively.
+Without the `-z` option, pathnames with "unusual" characters are
+quoted as explained for the configuration variable `core.quotePath`
+(see linkgit:git-config[1]).  Using `-z` the filename is output
+verbatim and the line is terminated by a NUL byte.
 
 diff format for merges
 --
diff --git a/Documentation/diff-generate-patch.txt 
b/Documentation/diff-generate-patch.txt
index d2a7ff5..231105c 100644
--- a/Documentation/diff-generate-patch.txt
+++ b/Documentation/diff-generate-patch.txt
@@ -53,10 +53,9 @@ The index line includes the SHA-1 checksum before and after 
the change.
 The  is included if the file mode does not change; otherwise,
 separate lines indicate the old and the new mode.
 
-3.  TAB, LF, double quote and backslash characters in pathnames
-are represented as `\t`, `\n`, `\"` and `\\`, respectively.
-If there is need for such substitution then the whole
-pathname is put in double quotes.
+3.  Pathnames with "unusual" characters are quoted as explained for
+the configuration variable `core.quotePath` (see
+linkgit:git-config[1]).
 
 4.  All the `file1` files in the output refer to files before the
 commit, and all the `file2` files refer to files after the commit.
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index e6215c3..7c28e73 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -192,10 +192,9 @@ ifndef::git-log[]
given, do not munge pathnames and use NULs as output field terminators.
 endif::git-log[]
 +
-Without this option, each pathname output will have TAB, LF, double quotes,
-and backslash characters replaced with `\t`, `\n`, `\"`, and `\\`,
-respectively, and the pathname will be enclosed in double quotes if
-any of those replacements occurred.
+Without this option, pathnames with "unusual" characters are munged as
+explained for the configuration variable `core.quotePath` (see
+linkgit:git-config[1]).
 
 --name-only::
Show only names of changed files.
diff --git a/Documentation/git-apply.txt b/Documentation/git-apply.txt
index 8ddb207..a7a001b 100644
--- a/Documentation/git-apply.txt
+++ b/Documentation/git-apply.txt
@@ -108,10 +108,9 @@ the information is read from the current index instead.
When `--numstat` has been given, do not munge pathnames,
but use a NUL-terminated machine-readable format.
 +
-Without this option, each pathname output will have TAB, LF, double quotes,
-and backslash characters replaced with `\t`, `\n`, `\"`, and `\\`,
-respectively, and the pathname will be enclosed in double quotes if
-any of those replacements occurred.
+Without this option, pathnames with "unusual" characters are munged as
+explained for the configuration variable `core.quotePath` (see
+linkgit:git-config[1]).
 
 -p::
Remove  leading slashes from traditional diff paths. The
diff --git a/Documentation/git-commit.txt b/Documentation/git-commit.txt
index 4f8f20a..25dcdcc 100644
--- a/Documentation/git-commit.txt
+++ b/Documentation/git-commit.txt
@@ -117,9 +117,12 @@ OPTIONS
 
 -z::
 --null::
-   When showing `short` or `porcelain` status output, terminate
-   entries in the status output with NUL, instead of LF. If no
-   format is given, implies the `--porcelain` output format.
+   When showing `short` or `porcelain` status output, print the
+   filename verbatim and terminate the entries with NUL, instead of LF.
+   If no format is given, implies the `--porcelain` output format.
+   Without the `-z` option, filenames with "unusual" characters are
+   quoted as explained for the configuration variable `core.quotePath`
+   (see linkgit:git-config[1]).
 
 -F ::
 --file=::
diff --git a/Documentation/git-ls-files.txt 

[PATCH v2 0/2] Documentation: Link git-ls-files to core.quotePath variable.

2017-02-24 Thread Andreas Heiduk
These two patches replace and extend the precious patches with the
same subject. Suggestions from Philip Oakley and Junio C Hamano are
included.

I tried to find and adjust all places where pathname quoting and "-z"
were described. I omitted these places:

* Here "-z" is for input only. Quoting is unclear to me.

-- git-mktree.txt
-- git-update-index.txt
-- git-checkout-index.txt

* Here pathname quoting is not mentioned:

-- git-check-ignore.txt
-- git-check-attr.txt

And last but not least: 

- git-grep.txt: The paths are always unquoted, `-z` toggles only the
delimiter. Perhaps some CAVEAT should be added.


Andreas Heiduk (2):
  Documentation: Improve description for core.quotePath
  Documentation: Link descriptions of -z to core.quotePath

 Documentation/config.txt  | 24 ++--
 Documentation/diff-format.txt |  7 ---
 Documentation/diff-generate-patch.txt |  7 +++
 Documentation/diff-options.txt|  7 +++
 Documentation/git-apply.txt   |  7 +++
 Documentation/git-commit.txt  |  9 ++---
 Documentation/git-ls-files.txt| 10 ++
 Documentation/git-ls-tree.txt | 10 +++---
 Documentation/git-status.txt  |  7 +++
 9 files changed, 49 insertions(+), 39 deletions(-)



Re: fatal error when diffing changed symlinks

2017-02-24 Thread Jeff King
On Fri, Feb 24, 2017 at 11:51:22AM -0800, Junio C Hamano wrote:

> > A slightly worse is that the upcoming Git will ship with a rewritten
> > "difftool" that makes the above sequence segfault.
> 
> The culprit seems to be these lines in run_dir_diff():
> 
>   if (S_ISLNK(lmode)) {
>   char *content = read_sha1_file(loid.hash, , );
>   add_left_or_right(, src_path, content, 0);
>   free(content);
>   }
> 
>   if (S_ISLNK(rmode)) {
>   char *content = read_sha1_file(roid.hash, , );
>   add_left_or_right(, dst_path, content, 1);
>   free(content);
>   }
> 
> When viewing a working tree file, oid.hash could be 0{40} and
> read_sha1_file() is not the right function to use to obtain the
> contents.
> 
> Both of these two need to pay attention to 0{40}, I think, as the
> user may be running "difftool -R --dir-diff" in which case the
> working tree would appear in the left hand side instead.

As a side note, I think even outside of 0{40}, this should be checking
the return value of read_sha1_file(). A corrupted repo should die(), not
segfault.

-Peff


Re: SHA1 collisions found

2017-02-24 Thread Philip Oakley

From: "Stefan Beller" 
On Fri, Feb 24, 2017 at 10:14 AM, Junio C Hamano  
wrote:



you are inviting people to start using

md5,54ddf8d47340e048166c45f439ce65fd

as object names.


which might even be okay for specific subsets of operations.
(e.g. all local work including staging things, making local "fixup" 
commits)


The addressing scheme should not be too hardcoded, we should rather
treat it similar to the cipher schemes in pgp. The additional complexity 
that

we have is the longevity of existence of things, though.



One potential nicety of using the md5 is that it is a known `toy problem` 
solution that could be used to explore how things might be made to work, 
without any expectation that the temporary code is in any way an 
experimental part of regular code. Maybe. It's good to have a toy problem to 
work on.


There are other issue to be considered as well, such as validating a 
transition of identical blobs and trees (at some point there will for some 
users be a forced update of hash of unchanged code), which probably requires 
two way traversal.


Philip 



Re: SHA1 collisions found

2017-02-24 Thread Junio C Hamano
ankostis  writes:

> Let's assume that git is retroffited to always support the "default"
> SHA-3, but support additionally more hash-funcs.
> If in the future SHA-3 also gets defeated, it would be highly unlikely
> that the same math would also break e.g. Blake.
> So certain high-profile repos might choose for extra security 2 or more 
> hashes.

I think you are conflating two unrelated things.

 * How are these "2 or more hashes" actually used?  Are you going to
   add three "parent " line to a commit with just one parent, each
   line storing the different hashes?  How will such a commit object
   be named---does it have three names and do you plan to have three
   copies of .git/refs/heads/master somehow, each of which have
   SHA-1, SHA-3 and Blake, and let any one hash to identify the
   object?

   I suspect you are not going to do so; instead, you would use a
   very long string that is a concatenation of these three hashes as
   if it is an output from a single hash function that produces a
   long result.

   So I think the most natural way to do the "2 or more for extra
   security" is to allow us to use a very long hash.  It does not
   help to allow an object to be referred to with any of these 2 or
   more hashes at the same time.

 * If employing 2 or more hashes by combining into one may enhance
   the security, that is wonderful.  But we want to discourage
   people from inventing their own combinations left and right and
   end up fragmenting the world.  If a project that begins with
   SHA-1 only naming is forked to two (or more) and each fork uses
   different hashes, merging them back will become harder than
   necessary unless you support all these hashes forks used.

Having said all that, the way to figure out the hash used in the way
we spell the object name may not be the best place to discourage
people from using random hashes of their choice.  But I think we
want to avoid doing something that would actively encourage
fragmentation.



Re: [PATCH] submodule init: warn about falling back to a local path

2017-02-24 Thread Stefan Beller
On Fri, Feb 24, 2017 at 12:19 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> When a submodule is initialized, the config variable 'submodule..url'
>> is set depending on the value of the same variable in the .gitmodules
>> file. When the URL indicates to be relative, then the url is computed
>> relative to its default remote. The default remote cannot be determined
>> accurately in all cases, such that it falls back to 'origin'.
>>
>> The 'origin' remote may not exist, though. In that case we give up looking
>> for a suitable remote and we'll just assume it to be a local relative path.
>
> IOW we keep the .url to be relative, the same as the original one we
> took from .gitmodules?

Well that is one way to say that. Another way is:
If we cannot construct a URL based on the given relative URL, we
demote the relative URL to be a relative PATH and resolve that instead.

>  That sounds like a sensible thing to do and
> I agree it makes sense to warn when it happens.

ok.

>
>> @@ -118,18 +122,22 @@ too (and can also report changes to a submodule's work 
>> tree).
>>
>>  init [--] [...]::
>>   Initialize the submodules recorded in the index (which were
>> - added and committed elsewhere) by copying submodule
>> - names and urls from .gitmodules to .git/config.
>> - Optional  arguments limit which submodules will be initialized.
>> - It will also copy the value of `submodule.$name.update` into
>> - .git/config.
>> - The key used in .git/config is `submodule.$name.url`.
>> - This command does not alter existing information in .git/config.
>> - You can then customize the submodule clone URLs in .git/config
>> - for your local setup and proceed to `git submodule update`;
>> - you can also just use `git submodule update --init` without
>> - the explicit 'init' step if you do not intend to customize
>> - any submodule locations.
>> + added and committed elsewhere) by copying `submodule.$name.url`
>> + from .gitmodules to .git/config, resolving relative urls to be
>> + relative to the default remote.
>
> I read this as "copying..., resolving relative to the default remote
> (if exists)."  A reader would wonder what happens if the default
> remote does not exist---don't we want to describe what happens in
> that case, like, "recording . (the current repository) as the
> upstream" or something?

eh, a better approach is s/copying//
We will resolve the relative URL no matter what. It's just that
we may end up with an absolute path instead of an absolute URL.

>
>> ++
>> +Optional  arguments limit which submodules will be initialized.
>> +If no path is specified all submodules are initialized.
>
> Perhaps s/ all submodules/,&/?

ok

Will reroll, once I have found a better wording.


Re: [PATCH] submodule init: warn about falling back to a local path

2017-02-24 Thread Junio C Hamano
Stefan Beller  writes:

> When a submodule is initialized, the config variable 'submodule..url'
> is set depending on the value of the same variable in the .gitmodules
> file. When the URL indicates to be relative, then the url is computed
> relative to its default remote. The default remote cannot be determined
> accurately in all cases, such that it falls back to 'origin'.
>
> The 'origin' remote may not exist, though. In that case we give up looking
> for a suitable remote and we'll just assume it to be a local relative path.

IOW we keep the .url to be relative, the same as the original one we
took from .gitmodules?  That sounds like a sensible thing to do and
I agree it makes sense to warn when it happens.

>   is the URL of the new submodule's origin repository.
>  This may be either an absolute URL, or (if it begins with ./
> -or ../), the location relative to the superproject's origin
> +or ../), the location relative to the superproject's default remote
>  repository (Please note that to specify a repository 'foo.git'
>  which is located right next to a superproject 'bar.git', you'll
>  have to use '../foo.git' instead of './foo.git' - as one might expect
>  when following the rules for relative URLs - because the evaluation
>  of relative URLs in Git is identical to that of relative directories).
> -If the superproject doesn't have an origin configured
> ++
> +The default remote is the remote of the remote tracking branch
> +of the current branch. If no such remote tracking branch exists or
> +the HEAD is detached, "origin" is assumed to be the default remote.
> +If the superproject doesn't have a default remote configured

OK.

> @@ -118,18 +122,22 @@ too (and can also report changes to a submodule's work 
> tree).
>  
>  init [--] [...]::
>   Initialize the submodules recorded in the index (which were
> - added and committed elsewhere) by copying submodule
> - names and urls from .gitmodules to .git/config.
> - Optional  arguments limit which submodules will be initialized.
> - It will also copy the value of `submodule.$name.update` into
> - .git/config.
> - The key used in .git/config is `submodule.$name.url`.
> - This command does not alter existing information in .git/config.
> - You can then customize the submodule clone URLs in .git/config
> - for your local setup and proceed to `git submodule update`;
> - you can also just use `git submodule update --init` without
> - the explicit 'init' step if you do not intend to customize
> - any submodule locations.
> + added and committed elsewhere) by copying `submodule.$name.url`
> + from .gitmodules to .git/config, resolving relative urls to be
> + relative to the default remote.

I read this as "copying..., resolving relative to the default remote
(if exists)."  A reader would wonder what happens if the default
remote does not exist---don't we want to describe what happens in
that case, like, "recording . (the current repository) as the
upstream" or something?

> ++
> +Optional  arguments limit which submodules will be initialized.
> +If no path is specified all submodules are initialized.

Perhaps s/ all submodules/,&/?

>  deinit [-f|--force] (--all|[--] ...)::
>   Unregister the given submodules, i.e. remove the whole
> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
> index 899dc334e3..15a5430c00 100644
> --- a/builtin/submodule--helper.c
> +++ b/builtin/submodule--helper.c
> @@ -356,12 +356,10 @@ static void init_submodule(const char *path, const char 
> *prefix, int quiet)
>   strbuf_addf(, "remote.%s.url", remote);
>   free(remote);
>  
> - if (git_config_get_string(remotesb.buf, ))
> - /*
> -  * The repository is its own
> -  * authoritative upstream
> -  */
> + if (git_config_get_string(remotesb.buf, )) {
> + warning(_("could not lookup configuration '%s'. 
> Assuming this repository is its own authoritative upstream."), remotesb.buf);

Sounds sensible (it might be a bit too long but it should be OK).

>   remoteurl = xgetcwd();
> + }
>   relurl = relative_url(remoteurl, url, NULL);
>   strbuf_release();
>   free(remoteurl);

Thanks.


Re: SHA1 collisions found

2017-02-24 Thread ankostis
On 24 February 2017 at 20:20, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> On Fri, Feb 24, 2017 at 10:14 AM, Junio C Hamano  wrote:
>>
>>> you are inviting people to start using
>>>
>>> md5,54ddf8d47340e048166c45f439ce65fd
>>>
>>> as object names.
>>
>> which might even be okay for specific subsets of operations.
>> (e.g. all local work including staging things, making local "fixup" commits)
>>
>> The addressing scheme should not be too hardcoded, we should rather
>> treat it similar to the cipher schemes in pgp. The additional complexity that
>> we have is the longevity of existence of things, though.
>
> The not-so-well-hidden agenda was exactly that we _SHOULD_ not
> mimick PGP.  They do not have a requirement to encourage everybody
> to use the same thing because each message is encrypted/signed
> independently, i.e. they do not have to chain things like we do.

But there is a scenario where supporting more hashes, in parallel, is
beneficial:

Let's assume that git is retroffited to always support the "default"
SHA-3, but support additionally more hash-funcs.
If in the future SHA-3 also gets defeated, it would be highly unlikely
that the same math would also break e.g. Blake.
So certain high-profile repos might choose for extra security 2 or more hashes.

Apologies if I'm misusing the list,
  Kostis


Re: SHA1 collisions found

2017-02-24 Thread Junio C Hamano
Junio C Hamano  writes:

> The not-so-well-hidden agenda was exactly that we _SHOULD_ not
> mimick PGP.  They do not have a requirement to encourage everybody
> to use the same thing because each message is encrypted/signed
> independently, i.e. they do not have to chain things like we do.

To put it less succinctly, PGP does not have incentive to encourage
everybody to converge to the same.  They can afford to say "You can
use whatever you among your circles agree to use and the rest of the
world won't care".  If two groups that have used different ones later
meet, both of them can switch to a common one from that point forward,
but their past exchanges won't affect the future.

You cannot say the same thing for Git.  Once you decide to merge two
histories from two camps, which may have originated from the same
codebase but then decided to use two different ones while they were
forked, you'd be forced to support all three forever.  We have a lot
stronger incentive to discourage fragmentation.





Re: fatal error when diffing changed symlinks

2017-02-24 Thread Junio C Hamano
Junio C Hamano  writes:

>> cd /tmp
>> mkdir a
>> cd a
>> git init
>> touch b
>> ln -s b c
>> git add .
>> git commit -m 'first'
>> touch d
>> rm c
>> ln -s d c
>> git difftool --dir-diff
>
> A slightly worse is that the upcoming Git will ship with a rewritten
> "difftool" that makes the above sequence segfault.

The culprit seems to be these lines in run_dir_diff():

if (S_ISLNK(lmode)) {
char *content = read_sha1_file(loid.hash, , );
add_left_or_right(, src_path, content, 0);
free(content);
}

if (S_ISLNK(rmode)) {
char *content = read_sha1_file(roid.hash, , );
add_left_or_right(, dst_path, content, 1);
free(content);
}

When viewing a working tree file, oid.hash could be 0{40} and
read_sha1_file() is not the right function to use to obtain the
contents.

Both of these two need to pay attention to 0{40}, I think, as the
user may be running "difftool -R --dir-diff" in which case the
working tree would appear in the left hand side instead.

I didn't follow the codepath for regular files closely, but the code
that follows the above excerpt does quite different things to lstate
and rstate, which makes me suspect that the code is not prepared to
see "-R"(everse) diff (and I further suspect that these issues were
inherited from the original scripted Porcelain).



Re: [PATCH v6 1/1] config: add conditional include

2017-02-24 Thread Ramsay Jones


On 24/02/17 13:14, Nguyễn Thái Ngọc Duy wrote:
[snip] 
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  Documentation/config.txt  | 61 +
>  config.c  | 97 
> +++
>  t/t1305-config-include.sh | 56 +++
>  3 files changed, 214 insertions(+)
> 
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 015346c417..6c0cd2a273 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -91,6 +91,56 @@ found at the location of the include directive. If the 
> value of the
>  relative to the configuration file in which the include directive was
>  found.  See below for examples.
>  
> +Conditional includes
> +
> +
> +You can include one config file from another conditionally by setting
> +a `includeIf..path` variable to the name of the file to be
> +included. The variable's value is treated the same way as `include.path`.
> +
> +The condition starts with a keyword followed by a colon and some data
> +whose format and meaning depends on the keyword. Supported keywords
> +are:
> +
> +`gitdir`::
> +
> + The data that follows the keyword `gitdir:` is used as a glob
> + pattern. If the location of the .git directory match the
> + pattern, the include condition is met.
> ++
> +The .git location which may be auto-discovered, or come from
> +`$GIT_DIR` environment variable. If the repository auto discovered via
> +a .git file (e.g. from submodules, or a linked worktree), the .git
> +location would be the final location, not where the .git file is.
> ++
> +The pattern can contain standard globbing wildcards and two additional
> +ones, `**/` and `/**`, that can match multiple path components. Please
> +refer to linkgit:gitignore[5] for details. For convenience:
> +
> + * If the pattern starts with `~/`, `~` will be substituted with the
> +   content of the environment variable `HOME`.
> +
> + * If the pattern starts with `./`, it is replaced with the directory
> +   containing the current config file.
> +
> + * If the pattern does not start with either `~/`, `./` or `/`, `**/`
> +   will be automatically prepended. For example, the pattern `foo/bar`
> +   becomes `**/foo/bar` and would match `/any/path/to/foo/bar`.
> +
> + * If the pattern ends with `/`, `**` will be automatically added. For
> +   example, the pattern `foo/` becomes `foo/**`. In other words, it
> +   matches "foo" and everything inside, recursively.
> +
> +`gitdir/i`::
> + This is the same as `gitdir` except that matching is done
> + case-insensitively (e.g. on case-insensitive file sytems)
> +
> +A few more notes on matching with `gitdir` and `gitdir/i`:
> +
> + * Symlinks in `$GIT_DIR` are not resolved before matching.
> +
> + * Note that "../" is not special and will match literally, which is
> +   unlikely what you want.
>  
>  Example
>  ~~~
> @@ -119,6 +169,17 @@ Example
>   path = foo ; expand "foo" relative to the current file
>   path = ~/foo ; expand "foo" in your `$HOME` directory
>  
> + ; include if $GIT_DIR is /path/to/foo/.git
> + [include-if "gitdir:/path/to/foo/.git"]

s/include-if/includeIf/

> + path = /path/to/foo.inc
> +
> + ; include for all repositories inside /path/to/group
> + [include-if "gitdir:/path/to/group/"]

ditto

> + path = /path/to/foo.inc
> +
> + ; include for all repositories inside $HOME/to/group
> + [include-if "gitdir:~/to/group/"]

ditto

ATB,
Ramsay Jones


[ANNOUNCE] Git v2.12.0

2017-02-24 Thread Junio C Hamano
The latest feature release Git v2.12.0 is now available at the
usual places.  It is comprised of 517 non-merge commits since
v2.11.0, contributed by 80 people, 24 of which are new faces.

The tarballs are found at:

https://www.kernel.org/pub/software/scm/git/

The following public repositories all have a copy of the 'v2.12.0'
tag and the 'master' branch that the tag points at:

  url = https://kernel.googlesource.com/pub/scm/git/git
  url = git://repo.or.cz/alt-git.git
  url = git://git.sourceforge.jp/gitroot/git-core/git.git
  url = git://git-core.git.sourceforge.net/gitroot/git-core/git-core
  url = https://github.com/gitster/git

New contributors whose contributions weren't in v2.11.0 are as follows.
Welcome to the Git development community!

  Alan Davies, Andreas Krey, Cornelius Weig, Damien Regad, David
  Pursehouse, Denton Liu, George Vanburgh, Igor Kushnir, Jack
  Bates, Joachim Jablon, Jordi Mas, Kristoffer Haugsbakk, Kyle
  Meyer, Luis Ressel, Lukas Puehringer, Markus Hitter, Peter Law,
  Rasmus Villemoes, Rogier Goossens, Stefan Dotterweich, Steven
  Penny, Vinicius Kursancew, Vladimir Panteleev, and Wolfram Sang.

Returning contributors who helped this release are as follows.
Thanks for your continued support.

  마누엘, Alexander Shopov, Alex Henrie, Anthony Ramine,
  Beat Bolli, Brandon Williams, brian m. carlson, Changwoo Ryu,
  Chris Packham, Christian Couder, David Aguilar, David Turner,
  Dennis Kaarsemaker, Dimitriy Ryazantcev, Elia Pinto, Eric Wong,
  Grégoire Paris, Heiko Voigt, Jacob Keller, Jean-Noel Avila,
  Jeff Hostetler, Jeff King, Jiang Xin, Johannes Schindelin,
  Johannes Sixt, Jonathan Tan, Junio C Hamano, Kyle J. McKay,
  Lars Schneider, Linus Torvalds, Luke Diamand, Matt McCutchen,
  Max Kirillov, Mike Hommey, Nguyễn Thái Ngọc Duy, Patrick
  Steinhardt, Paul Mackerras, Peter Krefting, Philip Oakley,
  Pranit Bauva, Ralf Thielow, Ramsay Jones, Ray Chen, René
  Scharfe, Richard Hansen, Santiago Torres, Satoshi Yasushima,
  Stefan Beller, Stephan Beyer, SZEDER Gábor, Thomas Gummerer,
  Torsten Bögershausen, Trần Ngọc Quân, Vasco Almeida,
  Vegard Nossum, and Vitaly "_Vi" Shukela.



Git 2.12 Release Notes
==

Backward compatibility notes.

 * Use of an empty string that is used for 'everything matches' is
   still warned and Git asks users to use a more explicit '.' for that
   instead.  The hope is that existing users will not mind this
   change, and eventually the warning can be turned into a hard error,
   upgrading the deprecation into removal of this (mis)feature.  That
   is not scheduled to happen in the upcoming release (yet).

 * The historical argument order "git merge  HEAD ..."
   has been deprecated for quite some time, and will be removed in a
   future release.

 * An ancient script "git relink" has been removed.


Updates since v2.11
---

UI, Workflows & Features

 * Various updates to "git p4".

 * "git p4" didn't interact with the internal of .git directory
   correctly in the modern "git-worktree"-enabled world.

 * "git branch --list" and friends learned "--ignore-case" option to
   optionally sort branches and tags case insensitively.

 * In addition to %(subject), %(body), "log --pretty=format:..."
   learned a new placeholder %(trailers).

 * "git rebase" learned "--quit" option, which allows a user to
   remove the metadata left by an earlier "git rebase" that was
   manually aborted without using "git rebase --abort".

 * "git clone --reference $there --recurse-submodules $super" has been
   taught to guess repositories usable as references for submodules of
   $super that are embedded in $there while making a clone of the
   superproject borrow objects from $there; extend the mechanism to
   also allow submodules of these submodules to borrow repositories
   embedded in these clones of the submodules embedded in the clone of
   the superproject.

 * Porcelain scripts written in Perl are getting internationalized.

 * "git merge --continue" has been added as a synonym to "git commit"
   to conclude a merge that has stopped due to conflicts.

 * Finer-grained control of what protocols are allowed for transports
   during clone/fetch/push have been enabled via a new configuration
   mechanism.

 * "git shortlog" learned "--committer" option to group commits by
   committer, instead of author.

 * GitLFS integration with "git p4" has been updated.

 * The isatty() emulation for Windows has been updated to eradicate
   the previous hack that depended on internals of (older) MSVC
   runtime.

 * Some platforms no longer understand "latin-1" that is still seen in
   the wild in e-mail headers; replace them with "iso-8859-1" that is
   more widely known when conversion fails from/to it.

 * "git grep" has been taught to optionally recurse into submodules.

 * "git rm" used to refuse to remove a submodule when it has its own
   git repository embedded in its 

What's cooking in git.git (Feb 2017, #08; Fri, 24)

2017-02-24 Thread Junio C Hamano
What's cooking in git.git (Feb 2017, #08; Fri, 24)
--

Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.  The ones marked with '.' do not appear in any of
the integration branches, but I am still holding onto them.

v2.12 has been tagged.  Special thanks go to Dscho, who (among other
things) laid the groundwork for speeding up "rebase -i" (which I am
hoping to be finalized in the upcoming cycle) and rewrote "difftool"
in C and to Peff who was all over the place cleaning the code up,
fixing surfaced and unsurfaced bugs.  Thanks for everybody else,
too, who worked hard on the release.

You can find the changes described here in the integration branches
of the repositories listed at

http://git-blame.blogspot.com/p/git-public-repositories.html

--
[Graduated to "master"]

* bc/blame-doc-fix (2017-02-22) 1 commit
  (merged to 'next' on 2017-02-22 at 81c0ff2283)
 + Documentation: use brackets for optional arguments

 Doc update.


* bc/worktree-doc-fix-detached (2017-02-22) 1 commit
  (merged to 'next' on 2017-02-22 at 8257025363)
 + Documentation: correctly spell git worktree --detach

 Doc update.


* dr/doc-check-ref-format-normalize (2017-02-21) 1 commit
  (merged to 'next' on 2017-02-21 at 5e88b7a93d)
 + git-check-ref-format: clarify documentation for --normalize

 Doc update.


* gp/document-dotfiles-in-templates-are-not-copied (2017-02-17) 1 commit
  (merged to 'next' on 2017-02-21 at bbfa2bb7d4)
 + init: document dotfiles exclusion on template copy

 Doc update.


* ps/doc-gc-aggressive-depth-update (2017-02-24) 1 commit
  (merged to 'next' on 2017-02-24 at f023322bbb)
 + docs/git-gc: fix default value for `--aggressiveDepth`

 Doc update.


* rt/align-add-i-help-text (2017-02-22) 1 commit
  (merged to 'next' on 2017-02-22 at a8573afb9a)
 + git add -i: replace \t with blanks in the help message

 Doc update.

--
[New Topics]

* dp/filter-branch-prune-empty (2017-02-23) 4 commits
 - p7000: add test for filter-branch with --prune-empty
 - filter-branch: fix --prune-empty on parentless commits
 - t7003: ensure --prune-empty removes entire branch when applicable
 - t7003: ensure --prune-empty can prune root commit

 "git filter-branch --prune-empty" drops a single-parent commit that
 becomes a no-op, but did not drop a root commit whose tree is empty.

 Needs review.


* jc/config-case-cmdline-take-2 (2017-02-23) 2 commits
 - config: use git_config_parse_key() in git_config_parse_parameter()
 - config: move a few helper functions up

 The code to parse "git -c VAR=VAL cmd" and set configuration
 variable for the duration of cmd had two small bugs, which have
 been fixed.

 Will merge to and then cook in 'next'.
 This supersedes jc/config-case-cmdline topic.

--
[Stalled]

* nd/worktree-move (2017-01-27) 7 commits
 . fixup! worktree move: new command
 . worktree remove: new command
 . worktree move: refuse to move worktrees with submodules
 . worktree move: accept destination as directory
 . worktree move: new command
 . worktree.c: add update_worktree_location()
 . worktree.c: add validate_worktree()

 "git worktree" learned move and remove subcommands.

 Tentatively ejected as it seems to break 'pu' when merged.


* cc/split-index-config (2016-12-26) 21 commits
 - Documentation/git-update-index: explain splitIndex.*
 - Documentation/config: add splitIndex.sharedIndexExpire
 - read-cache: use freshen_shared_index() in read_index_from()
 - read-cache: refactor read_index_from()
 - t1700: test shared index file expiration
 - read-cache: unlink old sharedindex files
 - config: add git_config_get_expiry() from gc.c
 - read-cache: touch shared index files when used
 - sha1_file: make check_and_freshen_file() non static
 - Documentation/config: add splitIndex.maxPercentChange
 - t1700: add tests for splitIndex.maxPercentChange
 - read-cache: regenerate shared index if necessary
 - config: add git_config_get_max_percent_split_change()
 - Documentation/git-update-index: talk about core.splitIndex config var
 - Documentation/config: add information for core.splitIndex
 - t1700: add tests for core.splitIndex
 - update-index: warn in case of split-index incoherency
 - read-cache: add and then use tweak_split_index()
 - split-index: add {add,remove}_split_index() functions
 - config: add git_config_get_split_index()
 - config: mark an error message up for translation

 The experimental "split index" feature has gained a few
 configuration variables to make it easier to use.

 Expecting a reroll.
 cf. <2016122610.17150-1-chrisc...@tuxfamily.org>
 cf. 
 cf. 

A note from the maintainer

2017-02-24 Thread Junio C Hamano
Welcome to the Git development community.

This message is written by the maintainer and talks about how Git
project is managed, and how you can work with it.

* Mailing list and the community

The development is primarily done on the Git mailing list. Help
requests, feature proposals, bug reports and patches should be sent to
the list address .  You don't have to be
subscribed to send messages.  The convention on the list is to keep
everybody involved on Cc:, so it is unnecessary to say "Please Cc: me,
I am not subscribed".

Before sending patches, please read Documentation/SubmittingPatches
and Documentation/CodingGuidelines to familiarize yourself with the
project convention.

If you sent a patch and you did not hear any response from anybody for
several days, it could be that your patch was totally uninteresting,
but it also is possible that it was simply lost in the noise.  Please
do not hesitate to send a reminder message in such a case.  Messages
getting lost in the noise may be a sign that those who can evaluate
your patch don't have enough mental/time bandwidth to process them
right at the moment, and it often helps to wait until the list traffic
becomes calmer before sending such a reminder.

The list archive is available at a few public sites:

http://public-inbox.org/git/
http://marc.info/?l=git
http://www.spinics.net/lists/git/

For those who prefer to read it over NNTP:

nntp://news.public-inbox.org/inbox.comp.version-control.git
nntp://news.gmane.org/gmane.comp.version-control.git

are available.

When you point at a message in a mailing list archive, using its
message ID is often the most robust (if not very friendly) way to do
so, like this:


http://public-inbox.org/git/pine.lnx.4.58.0504150753440.7...@ppc970.osdl.org

Often these web interfaces accept the message ID with enclosing <>
stripped (like the above example to point at one of the most important
message in the Git list).

Some members of the development community can sometimes be found on
the #git and #git-devel IRC channels on Freenode.  Their logs are
available at:

http://colabti.org/irclogger/irclogger_log/git
http://colabti.org/irclogger/irclogger_log/git-devel

There is a volunteer-run newsletter to serve our community ("Git Rev
News" http://git.github.io/rev_news/rev_news.html).

Git is a member project of software freedom conservancy, a non-profit
organization (https://sfconservancy.org/).  To reach a committee of
liaisons to the conservancy, contact them at .


* Reporting bugs

When you think git does not behave as you expect, please do not stop
your bug report with just "git does not work".  "I used git in this
way, but it did not work" is not much better, neither is "I used git
in this way, and X happend, which is broken".  It often is that git is
correct to cause X happen in such a case, and it is your expectation
that is broken. People would not know what other result Y you expected
to see instead of X, if you left it unsaid.

Please remember to always state

 - what you wanted to achieve;

 - what you did (the version of git and the command sequence to reproduce
   the behavior);

 - what you saw happen (X above);

 - what you expected to see (Y above); and

 - how the last two are different.

See http://www.chiark.greenend.org.uk/~sgtatham/bugs.html for further
hints.

If you think you found a security-sensitive issue and want to disclose
it to us without announcing it to wider public, please contact us at
our security mailing list .  This is
a closed list that is limited to people who need to know early about
vulnerabilities, including:

  - people triaging and fixing reported vulnerabilities
  - people operating major git hosting sites with many users
  - people packaging and distributing git to large numbers of people

where these issues are discussed without risk of the information
leaking out before we're ready to make public announcements.


* Repositories and documentation.

My public git.git repositories are at:

  git://git.kernel.org/pub/scm/git/git.git/
  https://kernel.googlesource.com/pub/scm/git/git
  git://repo.or.cz/alt-git.git/
  https://github.com/git/git/
  git://git.sourceforge.jp/gitroot/git-core/git.git/
  git://git-core.git.sourceforge.net/gitroot/git-core/git-core/

A few web interfaces are found at:

  http://git.kernel.org/cgit/git/git.git
  https://kernel.googlesource.com/pub/scm/git/git
  http://repo.or.cz/w/alt-git.git

Preformatted documentation from the tip of the "master" branch can be
found in:

  git://git.kernel.org/pub/scm/git/git-{htmldocs,manpages}.git/
  git://repo.or.cz/git-{htmldocs,manpages}.git/
  https://github.com/gitster/git-{htmldocs,manpages}.git/

The manual pages formatted in HTML for the tip of 'master' can be
viewed online at:

  https://git.github.io/htmldocs/git.html


* How various branches are used.

There are four 

Re: [PATCH 3/3] Makefile: add USE_SHA1DC knob

2017-02-24 Thread HW42
Jeff King:
> diff --git a/Makefile b/Makefile
> index 8e4081e06..7c4906250 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1386,6 +1390,11 @@ ifdef APPLE_COMMON_CRYPTO
>   SHA1_MAX_BLOCK_SIZE = 1024L*1024L*1024L
>  endif
>  
> +ifdef USE_SHA1DC
> + SHA1_HEADER = "sha1dc/sha1.h"
> + LIB_OBJS += sha1dc/sha1.o
> + LIB_OBJS += sha1dc/ubc_check.o
> +else
>  ifdef BLK_SHA1
>   SHA1_HEADER = "block-sha1/sha1.h"
>   LIB_OBJS += block-sha1/sha1.o
> @@ -1403,6 +1412,7 @@ else
>  endif
>  endif
>  endif
> +endif

This sets SHA1_MAX_BLOCK_SIZE and the compiler flags for Apple
CommonCrypto even if the user selects USE_SHA1DC. The same happens for
BLK_SHA1. Is this intended?

> +void git_SHA1DCUpdate(SHA1_CTX *ctx, const void *vdata, unsigned long len)
> +{
> + const char *data = vdata;
> + /* We expect an unsigned long, but sha1dc only takes an int */
> + while (len > INT_MAX) {
> + SHA1DCUpdate(ctx, data, INT_MAX);
> + data += INT_MAX;
> + len -= INT_MAX;
> + }
> + SHA1DCUpdate(ctx, data, len);
> +}

I think you can simply change the len parameter from unsigned into
size_t (or unsigned long) in SHA1DCUpdate().
https://github.com/cr-marcstevens/sha1collisiondetection/pull/6



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 15/15] builtin/checkout: add --recurse-submodules switch

2017-02-24 Thread Stefan Beller
On Thu, Feb 23, 2017 at 5:25 PM, Ramsay Jones
 wrote:
>> +int option_parse_recurse_submodules(const struct option *opt,
>> + const char *arg, int unset)
>
> Again, this function should be marked static.
>
> [I also noted _two_ other local functions with the same name
> in builtin/fetch.c and builtin/push.c]

fixed in a reroll.

Yes there is a pattern here. But as both fetch and push accept different
options (not just boolean, but strings) these have to be different.
I thought about unifying them, but I do not think we can do so easily.

Thanks,
Stefan


Re: SHA1 collisions found

2017-02-24 Thread Junio C Hamano
Stefan Beller  writes:

> On Fri, Feb 24, 2017 at 10:14 AM, Junio C Hamano  wrote:
>
>> you are inviting people to start using
>>
>> md5,54ddf8d47340e048166c45f439ce65fd
>>
>> as object names.
>
> which might even be okay for specific subsets of operations.
> (e.g. all local work including staging things, making local "fixup" commits)
>
> The addressing scheme should not be too hardcoded, we should rather
> treat it similar to the cipher schemes in pgp. The additional complexity that
> we have is the longevity of existence of things, though.

The not-so-well-hidden agenda was exactly that we _SHOULD_ not
mimick PGP.  They do not have a requirement to encourage everybody
to use the same thing because each message is encrypted/signed
independently, i.e. they do not have to chain things like we do.



Re: [PATCH 10/15] update submodules: add submodule_move_head

2017-02-24 Thread Stefan Beller
On Thu, Feb 23, 2017 at 5:21 PM, Ramsay Jones
 wrote:
>
>
> On 23/02/17 22:57, Stefan Beller wrote:
>> In later patches we introduce the options and flag for commands
>> that modify the working directory, e.g. git-checkout.
>>
>> This piece of code will be used universally for
>> all these working tree modifications as it
>> * supports dry run to answer the question:
>>   "Is it safe to change the submodule to this new state?"
>>   e.g. is it overwriting untracked files or are there local
>>   changes that would be overwritten?
>> * supports a force flag that can be used for resetting
>>   the tree.
>>
>> Signed-off-by: Stefan Beller 
>> ---
>>  submodule.c | 135 
>> 
>>  submodule.h |   7 
>>  2 files changed, 142 insertions(+)
>>
>> diff --git a/submodule.c b/submodule.c
>> index 0b2596e88a..a2cf8c9376 100644
>> --- a/submodule.c
>> +++ b/submodule.c
>> @@ -1239,6 +1239,141 @@ int bad_to_remove_submodule(const char *path, 
>> unsigned flags)
>>   return ret;
>>  }
>>
>> +static int submodule_has_dirty_index(const struct submodule *sub)
>> +{
>> + struct child_process cp = CHILD_PROCESS_INIT;
>> +
>> + prepare_submodule_repo_env_no_git_dir(_array);
>> +
>> + cp.git_cmd = 1;
>> + argv_array_pushl(, "diff-index", "--quiet", \
>> + "--cached", "HEAD", NULL);
>> + cp.no_stdin = 1;
>> + cp.no_stdout = 1;
>> + cp.dir = sub->path;
>> + if (start_command())
>> + die("could not recurse into submodule '%s'", sub->path);
>> +
>> + return finish_command();
>> +}
>> +
>> +void submodule_reset_index(const char *path)
>
> I was just about to send a patch against the previous series
> (in pu branch last night), but since you have sent another
> version ...
>
> In the last series this was called 'submodule_clean_index()'
> and, since it is a file-local symbol, should be marked with
> static. I haven't applied these patches to check, but the
> interdiff in the cover letter leads me to believe that this
> will also apply to the renamed function.
>
> [The patch subject was also slightly different.]
>

good catch. Yes submodule_reset_index
ought to be static.

fixed in a reroll.


Re: git email From: parsing (was Re: [GIT PULL] Staging/IIO driver patches for 4.11-rc1)

2017-02-24 Thread Jeff King
On Fri, Feb 24, 2017 at 12:03:45PM +0100, Geert Uytterhoeven wrote:

> > The problem isn't on the applying end, but rather on the generating end.
> > The From header in the attached mbox is:
> >
> >   From: =?us-ascii?B?PT9VVEYtOD9xP1NpbW9uPTIwU2FuZHN0cj1DMz1CNm0/PQ==?= 
> > 
> 
> Slightly related, once in a while I get funny emails through
> git-commits-h...@vger.kernel.org, where the subject is completely screwed up:
> 
> Subject: \x64\x72\x6D\x2F\x74\x69\x6E\x79\x64\x72\x6D\x3A
> \x6D\x69\x70\x69\x2D\x64\x62\x69\x3A \x53\x69\x6C\x65\x6E\x63\x65\x3A
> ‘\x63\x6D\x64’ \x6D\x61\x79 \x62\x65

Sorry, I don't have a clue on that one.

If you have UTF-8 or other non-ASCII characters in your subject,
format-patch will correctly do the rfc2047 encoding (and it should
always use QP). And that would kick in here because of the UTF-8 quotes.

But that weird "\x" encoding is not in any mail standard I know of (and
certainly Git would never do it).

The odd thing is that the quotes themselves _aren't_ encoded. Just
everything else.

One other feature is that subject line is long enough (especially
QP-encoded) that it spans two lines:

  $ git format-patch --stdout -1 b401f34314d | grep -A1 ^Subject
  Subject: [PATCH] =?UTF-8?q?drm/tinydrm:=20mipi-dbi:=20Silence:=20=E2=80=98?=
   =?UTF-8?q?cmd=E2=80=99=20may=20be=20used=20uninitialized?=

It's possible that something along the way is mis-handling subjects with
line-continuation (though why it would escape those characters, I don't
know).

> and some of the mail headers end up in the body as well:
> 
> 
> =?UTF-8?Q?\x75\x73\x65\x64_\x75\x6E\x69\x6E\x69\x74\x69\x61\x6C\x69\x7A\x65\x64?=

That might be related to the whitespace continuation (the first line
after the break is the second line of the subject).

-Peff


Re: SHA1 collisions found

2017-02-24 Thread Stefan Beller
On Fri, Feb 24, 2017 at 10:14 AM, Junio C Hamano  wrote:

> you are inviting people to start using
>
> md5,54ddf8d47340e048166c45f439ce65fd
>
> as object names.

which might even be okay for specific subsets of operations.
(e.g. all local work including staging things, making local "fixup" commits)

The addressing scheme should not be too hardcoded, we should rather
treat it similar to the cipher schemes in pgp. The additional complexity that
we have is the longevity of existence of things, though.


Re: [PATCH 3/3] Makefile: add USE_SHA1DC knob

2017-02-24 Thread Jeff King
On Fri, Feb 24, 2017 at 06:36:00PM +, HW42 wrote:

> > +ifdef USE_SHA1DC
> > +   SHA1_HEADER = "sha1dc/sha1.h"
> > +   LIB_OBJS += sha1dc/sha1.o
> > +   LIB_OBJS += sha1dc/ubc_check.o
> > +else
> >  ifdef BLK_SHA1
> > SHA1_HEADER = "block-sha1/sha1.h"
> > LIB_OBJS += block-sha1/sha1.o
> > @@ -1403,6 +1412,7 @@ else
> >  endif
> >  endif
> >  endif
> > +endif
> 
> This sets SHA1_MAX_BLOCK_SIZE and the compiler flags for Apple
> CommonCrypto even if the user selects USE_SHA1DC. The same happens for
> BLK_SHA1. Is this intended?

No, it's not. I suspect that setting BLK_SHA1 has the same problem in
the current code, then.

> > +void git_SHA1DCUpdate(SHA1_CTX *ctx, const void *vdata, unsigned long len)
> > +{
> > +   const char *data = vdata;
> > +   /* We expect an unsigned long, but sha1dc only takes an int */
> > +   while (len > INT_MAX) {
> > +   SHA1DCUpdate(ctx, data, INT_MAX);
> > +   data += INT_MAX;
> > +   len -= INT_MAX;
> > +   }
> > +   SHA1DCUpdate(ctx, data, len);
> > +}
> 
> I think you can simply change the len parameter from unsigned into
> size_t (or unsigned long) in SHA1DCUpdate().
> https://github.com/cr-marcstevens/sha1collisiondetection/pull/6

Yeah, I agree that is a cleaner solution. My focus was on changing the
(presumably tested) sha1dc code as little as possible.

-Peff


Re: [PATCH v6 1/1] config: add conditional include

2017-02-24 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> +Conditional includes
> +
> +
> +You can include one config file from another conditionally by setting
> +a `includeIf..path` variable to the name of the file to be
> +included. The variable's value is treated the same way as `include.path`.
> +
> +The condition starts with a keyword followed by a colon and some data
> +whose format and meaning depends on the keyword. Supported keywords
> +are:
> +
> +`gitdir`::
> +
> + The data that follows the keyword `gitdir:` is used as a glob
> + pattern. If the location of the .git directory match the
> + pattern, the include condition is met.

s/match//, I think.

> ++
> +The .git location which may be auto-discovered, or come from

s/ which//?

> +`$GIT_DIR` environment variable. If the repository auto discovered via

s/If the /In a/?

> +a .git file (e.g. from submodules, or a linked worktree), the .git
> +location would be the final location, not where the .git file is.

OK.

> @@ -170,9 +171,99 @@ static int handle_path_include(const char *path, struct 
> config_include_data *inc
>   return ret;
>  }
>  
> +static int prepare_include_condition_pattern(struct strbuf *pat)
> +{
> + ...
> + /* TODO: escape wildcards */
> + strbuf_add_absolute_path(, cf->path);

Is this still TODO?  As this one returns the prefix length (which
probably wants to be commented before the function) and this codepath
computes the prefix to cover the path to here, doesn't caller already
do the right thing?

> +static int include_condition_is_true(const char *cond, size_t cond_len)
> +{
> + /* no condition (i.e., "include.path") is always true */

Does this want to say "includeIf.path" instead?  "include.path" is
done by handle_path_include() without involving a call to this
function.

> + if (!cond)
> + return 1;
> +
> + if (skip_prefix_mem(cond, cond_len, "gitdir:", , _len))
> + return include_by_gitdir(cond, cond_len, 0);
> + else if (skip_prefix_mem(cond, cond_len, "gitdir/i:", , _len))
> + return include_by_gitdir(cond, cond_len, 1);
> +
> + error(_("unrecognized include condition: %.*s"), (int)cond_len, cond);
> + /* unknown conditionals are always false */
> + return 0;
> +}

Thanks.


[PATCH] submodule init: warn about falling back to a local path

2017-02-24 Thread Stefan Beller
When a submodule is initialized, the config variable 'submodule..url'
is set depending on the value of the same variable in the .gitmodules
file. When the URL indicates to be relative, then the url is computed
relative to its default remote. The default remote cannot be determined
accurately in all cases, such that it falls back to 'origin'.

The 'origin' remote may not exist, though. In that case we give up looking
for a suitable remote and we'll just assume it to be a local relative path.

This can be confusing to users as there is a lot of guessing involved,
which is not obvious to the user.

So in the corner case of assuming a local autoritative truth, warn the
user to lessen the confusion.

This behavior was introduced in 4d6893200 (submodule add: allow relative
repository path even when no url is set, 2011-06-06), which shared the
code with submodule-init and then ported to C in 3604242f080a (submodule:
port init from shell to C, 2016-04-15).

In case of submodule-add, this behavior makes sense in some use cases[1],
however for submodule-init there does not seem to be an immediate obvious
use case to fall back to a local submodule. However there might be, so
warn instead of die here.

While adding the warning, also clarify the behavior of relative URLs in
the documentation.

[1] e.g. 
http://stackoverflow.com/questions/8721984/git-ignore-files-for-public-repository-but-not-for-private
"store a secret locally in a submodule, with no intention to publish it"

Reported-by: Shawn Pearce 
Signed-off-by: Stefan Beller 
---

* reworded sentence to not omit half of it.
* tested rendering to be correct
* warn before xgetcwd

Thanks,
Stefan

 Documentation/git-submodule.txt | 36 ++--
 builtin/submodule--helper.c |  8 +++-
 2 files changed, 25 insertions(+), 19 deletions(-)

diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index 8acc72ebb8..b810d37aa2 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -73,13 +73,17 @@ configuration entries unless `--name` is used to specify a 
logical name.
 +
  is the URL of the new submodule's origin repository.
 This may be either an absolute URL, or (if it begins with ./
-or ../), the location relative to the superproject's origin
+or ../), the location relative to the superproject's default remote
 repository (Please note that to specify a repository 'foo.git'
 which is located right next to a superproject 'bar.git', you'll
 have to use '../foo.git' instead of './foo.git' - as one might expect
 when following the rules for relative URLs - because the evaluation
 of relative URLs in Git is identical to that of relative directories).
-If the superproject doesn't have an origin configured
++
+The default remote is the remote of the remote tracking branch
+of the current branch. If no such remote tracking branch exists or
+the HEAD is detached, "origin" is assumed to be the default remote.
+If the superproject doesn't have a default remote configured
 the superproject is its own authoritative upstream and the current
 working directory is used instead.
 +
@@ -118,18 +122,22 @@ too (and can also report changes to a submodule's work 
tree).
 
 init [--] [...]::
Initialize the submodules recorded in the index (which were
-   added and committed elsewhere) by copying submodule
-   names and urls from .gitmodules to .git/config.
-   Optional  arguments limit which submodules will be initialized.
-   It will also copy the value of `submodule.$name.update` into
-   .git/config.
-   The key used in .git/config is `submodule.$name.url`.
-   This command does not alter existing information in .git/config.
-   You can then customize the submodule clone URLs in .git/config
-   for your local setup and proceed to `git submodule update`;
-   you can also just use `git submodule update --init` without
-   the explicit 'init' step if you do not intend to customize
-   any submodule locations.
+   added and committed elsewhere) by copying `submodule.$name.url`
+   from .gitmodules to .git/config, resolving relative urls to be
+   relative to the default remote.
++
+Optional  arguments limit which submodules will be initialized.
+If no path is specified all submodules are initialized.
++
+When present, it will also copy the value of `submodule.$name.update`.
+This command does not alter existing information in .git/config.
+You can then customize the submodule clone URLs in .git/config
+for your local setup and proceed to `git submodule update`;
+you can also just use `git submodule update --init` without
+the explicit 'init' step if you do not intend to customize
+any submodule locations.
++
+See the add subcommand for the defintion of default remote.
 
 deinit [-f|--force] (--all|[--] ...)::
Unregister the given submodules, i.e. remove the whole
diff --git 

Re: SHA1 collisions found

2017-02-24 Thread Junio C Hamano
David Lang  writes:

> On Fri, 24 Feb 2017, Junio C Hamano wrote:
>
>> *1* In the above toy example, length being 40 vs 64 is used as a
>>sign between SHA-1 and the new hash, and careful readers may
>>wonder if we should use sha-3,20769079d22... or something like
>>that that more explicity identifies what hash is used, so that
>>we can pick a hash whose length is 64 when we transition again.
>>
>>I personally do not think such a prefix is necessary during the
>>first transition; we will likely to adopt a new hash again, and
>>at that point that third one can have a prefix to differenciate
>>it from the second one.
>
> as the saying goes "in computer science the interesting numbers are 0,
> 1, and many", does it really simplify things much to support 2 hashes
> vs supporting more so that this issue doesn't have to be revisited?
> (other than selecting new hashes over time)

It seems that I wasn't clear enough, perhaps?  The scheme I outlined
does not have to revisit this issue at all.  It already declares what
you need to do when you add the third one.  

If it is not 40 or 64 bytes long, you just write it out.  If it is
one of these length, then you add some identifying prefix or
postfix.  IOW, if the second one is sha-3 and the third one is blake
(both used at 256-bit), then we would have three kinds of names,
written like so:

20769079d22a9f8010232bdf6131918c33a1bf69
20769079d22a9f8010232bdf6131918c33a1bf6910232bdf6131918c33a1bf69
3,20769079d22a9f8010232bdf6131918c33a1bf6910232bdf6131918c33a1bf69

and the readers can well tell that the first one, being 40-chars
long, is SHA-1, the second one, being 64-chars long, is SHA-3, and
the last one, with the prefix '3' (only because that is the third
one officially supported by Git) and being 64-chars long, is blake,
for example.

I do not particularly care if it is prefix or postfix or something
else.  A not-so-well-hidden agenda is to avoid inviting people into
thinking that they can use their choice of random hash functions and
and claim that their hacked version is still a Git, as long as they
follow the object naming convention.  IOW, if you said something
like:

 * 40-hex is SHA-1 for historical reasons;
 * Others use hash-name, colon, and then N-hex.

you are inviting people to start using

md5,54ddf8d47340e048166c45f439ce65fd

as object names.


Re: [GIT PULL] l10n updates for 2.12.0

2017-02-24 Thread Junio C Hamano
Jiang Xin  writes:

> Hi Junio,
>
> Please pull l10n updates for Git 2.12.0:
>
> The following changes since commit 076c05393a047247ea723896289b48d6549ed7d0:
>
>   Hopefully the final batch of mini-topics before the final
> (2017-02-16 14:46:35 -0800)
>
> are available in the git repository at:
>
>   git://github.com/git-l10n/git-po tags/l10n-2.12.0-rnd2
>
> for you to fetch changes up to 1a79b2f1795a6ec4c70674ce930843aa59bff859:
>
>   l10n: zh_CN: for git v2.12.0 l10n round 2 (2017-02-25 00:19:14 +0800)

Will pull.  Thanks, all.


Re: [PATCH v5 1/1] config: add conditional include

2017-02-24 Thread Junio C Hamano
Duy Nguyen  writes:

>>> + if (skip_prefix_mem(cond, cond_len, "gitdir:", , _len))
>>> + return include_by_gitdir(cond, cond_len, 0);
>>> + else if (skip_prefix_mem(cond, cond_len, "gitdir/i:", , 
>>> _len))
>>> + return include_by_gitdir(cond, cond_len, 1);
>>
>> This may be OK for now, but it should be trivial to start from a
>> table with two entries, i.e.
>>
>> struct include_cond {
>> const char *keyword;
>> int (*fn)(const char *, size_t);
>> };
>>
>> and will show a better way to do things to those who follow your
>> footsteps.
>
> Yeah I don't see a third include coming soon and did not go with that.
> Let's way for it and refactor then.

I would have said exactly that in my message if you already had
include_by_gitdir() and include_by_gitdir_i() as separate functions.

But I didn't, because the above code gives an excuse to the person
who adds the third one to be lazy and add another "else if" for
expediency, because making it table-driven would require an extra
work to add two wrapper functions that do not have anything to do
with the third one being added.



Re: fatal error when diffing changed symlinks

2017-02-24 Thread Junio C Hamano
Christophe Macabiau  writes:

> with the commands below, you will get :
>
>> fatal: bad object 
>> show : command returned error: 128
>>
>
> I am using version 2.5.5 fedora 23
>
> cd /tmp
> mkdir a
> cd a
> git init
> touch b
> ln -s b c
> git add .
> git commit -m 'first'
> touch d
> rm c
> ln -s d c
> git difftool --dir-diff

A slightly worse is that the upcoming Git will ship with a rewritten
"difftool" that makes the above sequence segfault.  As either way is
bad, I do not particularly think the rewritten one needs to be
reverted (it would give "fatal: bad object" then, and would not fix
your problem), but it needs to be looked at.

Thanks for a report.



Re: SHA1 collisions found

2017-02-24 Thread David Lang

On Fri, 24 Feb 2017, Junio C Hamano wrote:


*1* In the above toy example, length being 40 vs 64 is used as a
   sign between SHA-1 and the new hash, and careful readers may
   wonder if we should use sha-3,20769079d22... or something like
   that that more explicity identifies what hash is used, so that
   we can pick a hash whose length is 64 when we transition again.

   I personally do not think such a prefix is necessary during the
   first transition; we will likely to adopt a new hash again, and
   at that point that third one can have a prefix to differenciate
   it from the second one.


as the saying goes "in computer science the interesting numbers are 0, 1, and 
many", does it really simplify things much to support 2 hashes vs supporting 
more so that this issue doesn't have to be revisited? (other than selecting new 
hashes over time)


David Lang


Re: [PATCH] docs/git-gc: fix default value for `--aggressiveDepth`

2017-02-24 Thread Junio C Hamano
Jeff King  writes:

> On Fri, Feb 24, 2017 at 09:46:45AM +0100, Patrick Steinhardt wrote:
>
>> In commit 07e7dbf0d (gc: default aggressive depth to 50, 2016-08-11),
>> the default aggressive depth of git-gc has been changed to 50. While
>> git-config(1) has been updated to represent the new default value,
>> git-gc(1) still mentions the old value. This patch fixes it.
>
> Thanks, this is obviously an improvement.
>
> (I also grepped for "250" in Documentation; the results are thankfully
> short, and the only other mentions I saw were referring to the window
> size).

Thanks, both.  Will queue.


Re: SHA1 collisions found

2017-02-24 Thread Jason Cooper
Hi Ian,

On Fri, Feb 24, 2017 at 03:13:37PM +, Ian Jackson wrote:
> Joey Hess writes ("SHA1 collisions found"):
> > https://shattered.io/static/shattered.pdf
> > https://freedom-to-tinker.com/2017/02/23/rip-sha-1/
> > 
> > IIRC someone has been working on parameterizing git's SHA1 assumptions
> > so a repository could eventually use a more secure hash. How far has
> > that gotten? There are still many "40" constants in git.git HEAD.
> 
> I have been thinking about how to do a transition from SHA1 to another
> hash function.
> 
> I have concluded that:
> 
>  * We can should avoid expecting everyone to rewrite all their
>history.

Agreed.

>  * Unfortunately, because the data formats (particularly, the commit
>header) are not in practice extensible (because of the way existing
>code parses them), it is not useful to try generate new data (new
>commits etc.) containing both new hashes and old hashes: old
>clients will mishandle the new data.

My thought here is:

 a) re-hash blobs with sha256, hardlink to sha1 objects
 b) create new tree objects which are mirrors of each sha1 tree object,
but purely sha256
 c) mirror commits, but they are also purely sha256
 d) future PGP signed tags would sign both hashes (or include both?)

Which would end up something like:

  .git/
\... #usual files
\objects
  \ef
\3c39f7522dc55a24f64da9febcfac71e984366
\objects-sha2_256
  \72
\604fd2de5f25c89d692b01081af93bcf00d2af34549d8d1bdeb68bc048932
\info
  \...
\info-sha2_256
  \refs #uses sha256 commit identifiers

Basically, keep the sha256 stuff out of the way for legacy clients, and
new clients will still be able to use it.

There shouldn't be a need to re-sign old signed tags if the underlying
objects are counter-hashed.  There might need to be some transition
info, though.

Say a new client does 'git tag -v tags/v3.16' in the kernel tree.  I would
expect it to check the sha1 hashes, verify the PGP signed tag, and then
also check the sha256 counter-hashes of the relevant objects.

thx,

Jason.


  1   2   >