date:20170306

Junio C Hamano  writes:

>>  - If our submodule is bound at path sub/dir in the superproject,
>>the relative-path thing above would get "dir" and this ls-tree
>>ends up asking what is at "dir", but the question you really want
>>to ask is what is at "sub/dir", isn't it?
>
> IOW, the basic ontline of the idea may be OK, but I think you would
> want to do something along this line:
>
> - chdir to .. from the root of your submodule working tree;
> - in that .. directory, ask what prefix it is (you'd get
>   "sub/dir", or "not a git" if you are not a submodule);
> - in that .. directory, ask ls-files what sub/dir is;
> - if it is 16, you're happy.

Nah, that wouldn't be necessary and would not work.  --prefix would
be "sub" in that case, and you'd need to concatenate the "dir" that
is the basename of the path to the submodule to get "sub/dir".  

Besides, output from "ls-files" by default is relative to cwd, so if
you did ls-files or ls-tree HEAD in "sub/", you'll find where you
came from as "dir", not "sub/dir", so the original code happens to
work even from a subdirectory of a superproject, but the reason why
it works is a bit subtle.  Perhaps it deserves in-code comment to
explain it.

Re: [RFC PATCH] rev-parse: add --show-superproject-working-tree

Junio C Hamano  writes:

>> +dirname = relative_path(xgetcwd(), one_up, );
>
> So, the idea is we start at the root level of the current project's
> working tree, and we go up one level, then we know the last component
> of the path our submodule is bound at in the superproject.
>
>> +prepare_submodule_repo_env(_array);
>> +argv_array_pop(_array);
>> +argv_array_pushl(, "--literal-pathspecs", "-C", "..",
>> +"ls-tree", "HEAD", "--", dirname, NULL);
>
> This would ask our superproject what is at the "dirname" in its
> HEAD.  Two possible issues:
>
>  - Shouldn't that be looking at its index instead?  It would be more
>correct for unborn branch case, and new or moved submodule case.
>
>  - If our submodule is bound at path sub/dir in the superproject,
>the relative-path thing above would get "dir" and this ls-tree
>ends up asking what is at "dir", but the question you really want
>to ask is what is at "sub/dir", isn't it?

IOW, the basic ontline of the idea may be OK, but I think you would
want to do something along this line:

- chdir to .. from the root of your submodule working tree;
- in that .. directory, ask what prefix it is (you'd get
  "sub/dir", or "not a git" if you are not a submodule);
- in that .. directory, ask ls-files what sub/dir is;
- if it is 16, you're happy.

Re: [RFC PATCH] rev-parse: add --show-superproject-working-tree

Stefan Beller  writes:

> diff --git a/submodule.c b/submodule.c
> index 3b98766a6b..a63aef2c6b 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1514,3 +1514,90 @@ void absorb_git_dir_into_superproject(const char 
> *prefix,
>   strbuf_release();
>   }
>  }

Please have a comment here what this function expects (e.g. it is
called after we ran repository discovery and $cwd is at the root of
the working tree of the current project, we _know_ we have a working
tree, etc.).

> +
> +static int superproject_exists(void)
> +{
> + struct child_process cp = CHILD_PROCESS_INIT;
> + struct strbuf sb = STRBUF_INIT;
> + struct strbuf buf = STRBUF_INIT;
> + const char *one_up = real_path_if_valid("../");
> + const char *dirname;
> + int code, has_superproject = 0;
> +
> + if (!one_up)
> + /* At the root of the file system. */
> + return 0;

Hmph.  I would have expected that "/../" would be the same as "/".

> + dirname = relative_path(xgetcwd(), one_up, );

So, the idea is we start at the root level of the current project's
working tree, and we go up one level, then we know the last component
of the path our submodule is bound at in the superproject.

> + prepare_submodule_repo_env(_array);
> + argv_array_pop(_array);
> + argv_array_pushl(, "--literal-pathspecs", "-C", "..",
> + "ls-tree", "HEAD", "--", dirname, NULL);

This would ask our superproject what is at the "dirname" in its
HEAD.  Two possible issues:

 - Shouldn't that be looking at its index instead?  It would be more
   correct for unborn branch case, and new or moved submodule case.

 - If our submodule is bound at path sub/dir in the superproject,
   the relative-path thing above would get "dir" and this ls-tree
   ends up asking what is at "dir", but the question you really want
   to ask is what is at "sub/dir", isn't it?

> +const char *get_superproject_working_tree()

const char *get_superproject_working_tree(void)

[RFC PATCH] rev-parse: add --show-superproject-working-tree

Signed-off-by: Stefan Beller 
---

Marking this as RFC as documentation and tests are missing.

 builtin/rev-parse.c |  7 +
 submodule.c | 87 +
 submodule.h |  8 +
 3 files changed, 102 insertions(+)

diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
index e08677e559..2549643267 100644
--- a/builtin/rev-parse.c
+++ b/builtin/rev-parse.c
@@ -12,6 +12,7 @@
 #include "diff.h"
 #include "revision.h"
 #include "split-index.h"
+#include "submodule.h"
 
 #define DO_REVS1
 #define DO_NOREV   2
@@ -779,6 +780,12 @@ int cmd_rev_parse(int argc, const char **argv, const char 
*prefix)
puts(work_tree);
continue;
}
+   if (!strcmp(arg, "--show-superproject-working-tree")) {
+   const char *superproject = 
get_superproject_working_tree();
+   if (superproject)
+   puts(superproject);
+   continue;
+   }
if (!strcmp(arg, "--show-prefix")) {
if (prefix)
puts(prefix);
diff --git a/submodule.c b/submodule.c
index 3b98766a6b..a63aef2c6b 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1514,3 +1514,90 @@ void absorb_git_dir_into_superproject(const char *prefix,
strbuf_release();
}
 }
+
+static int superproject_exists(void)
+{
+   struct child_process cp = CHILD_PROCESS_INIT;
+   struct strbuf sb = STRBUF_INIT;
+   struct strbuf buf = STRBUF_INIT;
+   const char *one_up = real_path_if_valid("../");
+   const char *dirname;
+   int code, has_superproject = 0;
+
+   if (!one_up)
+   /* At the root of the file system. */
+   return 0;
+
+   dirname = relative_path(xgetcwd(), one_up, );
+   prepare_submodule_repo_env(_array);
+   argv_array_pop(_array);
+   argv_array_pushl(, "--literal-pathspecs", "-C", "..",
+   "ls-tree", "HEAD", "--", dirname, NULL);
+
+   cp.no_stdin = 1;
+   cp.no_stderr = 1;
+   cp.out = -1;
+   cp.git_cmd = 1;
+
+   if (start_command())
+   die(_("could not start ls-tree in .."));
+
+   strbuf_read(, cp.out, 7);
+   close(cp.out);
+   if (starts_with(buf.buf, "16"))
+   /* there is a superproject having this as a submodule */
+   has_superproject = 1;
+
+   code = finish_command();
+
+   if (code == 128)
+   /* not a git repository */
+   goto out;
+   if (code == 0 && !has_superproject)
+   /* there is an unrelated git repository */
+   goto out;
+
+   if (code)
+   die(_("ls-tree returned unexpected return code"));
+
+   return 1;
+
+out:
+   strbuf_release();
+
+   return 0;
+}
+
+const char *get_superproject_working_tree()
+{
+   struct child_process cp = CHILD_PROCESS_INIT;
+   struct strbuf sb = STRBUF_INIT;
+
+   if (!superproject_exists())
+   return NULL;
+
+   prepare_submodule_repo_env(_array);
+   argv_array_pop(_array);
+
+   argv_array_pushl(, "-C", "..",
+   "rev-parse", "--show-toplevel", NULL);
+
+   cp.no_stdin = 1;
+   cp.no_stderr = 1;
+   cp.out = -1;
+   cp.git_cmd = 1;
+
+   if (start_command())
+   die(_("could not start rev-parse in .."));
+
+   strbuf_reset();
+   strbuf_read(, cp.out, PATH_MAX);
+
+   /* remove trailing new line */
+   strbuf_rtrim();
+
+   if (finish_command())
+   die(_("rev-parse died unexpectedly"));
+
+   return strbuf_detach(, NULL);
+}
diff --git a/submodule.h b/submodule.h
index 05ab674f06..f207bb8d5f 100644
--- a/submodule.h
+++ b/submodule.h
@@ -93,4 +93,12 @@ extern void prepare_submodule_repo_env(struct argv_array 
*out);
 extern void absorb_git_dir_into_superproject(const char *prefix,
 const char *path,
 unsigned flags);
+
+/*
+ * Return the absolute path of the working tree of the superproject, which this
+ * project is a submodule of. If this repository is not a submodule of
+ * another repository, return NULL.
+ */
+extern const char *get_superproject_working_tree();
+
 #endif
-- 
2.12.0.189.g3bc53220cb.dirty

[RFCv7 PATCH 00/18] Checkout aware of Submodules!

previous work:
https://public-inbox.org/git/20170302004759.27852-1-sbel...@google.com
https://public-inbox.org/git/20161203003022.29797-1-sbel...@google.com/

v7:
 * addressed Erics comment by fixing the bashism in t/lib-submodule-update.sh
diff to v6:
  diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
  index 5caae06bc5..949ebd546c 100755
  --- a/t/lib-submodule-update.sh
  +++ b/t/lib-submodule-update.sh
  @@ -255,7 +255,7 @@ test_superproject_content () {
   # Test that the given submodule at path "$1" contains the content according
   # to the submodule commit recorded in the superproject's commit "$2"
   test_submodule_content () {
  -   if test "$1" == "-C"
  +   if test x"$1" = "x-C"
  then
  cd "$2"
  shift; shift;
 * interdiff to v5 below
  

v6:
 * added support for read-tree (see last patch) to see how generic the
   code of the previous patches is. I am pretty pleased with that patch.
 * marked two functions static. Thanks Ramsay!
 * fixed the recursive test; it still fails but it is the code that fails,
   not the test. For this I had to change the setup code slightly.
 * 2 new patches adding tiny refactoring to the submodule test lib.  
 * interdiff (to origin/sb/checkout-recurse-submodules, which is v5) below.

v5:
 * as v4 was the first version queued by Junio, we do have an interdiff below!
 * renamed functions
 * changed the API, now the caller has to take care of the submodule strategy
   themselves. (Note this can be different for different situations, e.g.
   when a submodule is deleted, we can do that for any strategy except none and
   !command. But for moving to a new state of the submodule we currently
   only implement "checkout" [unspecified defaults to checkout]. warning about
   the others, doing nothing there.)

v4:
 * addressed all comments of Brian, Junio and Brandon.
 Thanks!
 * one major point of change is the introduction of another patch
   "lib-submodule-update.sh: do not use ./. as submodule remote",
   as that took some time to track down the existing bug.
 
v3:
 * moved tests from t2013 to the generic submodule library.
 * factored out the refactoring patches to be up front
 * As I redid the complete implementation, I have the impression this time
   it is cleaner than previous versions.
 
 I think we still have to fix the corner cases of directory/file/submodule 
 conflicts before merging, but this serves as a status update on my current
 way of thinking how to implement the worktree commands being aware of
 submodules.
 
Thanks,
Stefan

v2:
* based on top of the series sent out an hour ago
  "[PATCHv4 0/5] submodule embedgitdirs"
* Try to embed a submodule if we need to remove it.
* Strictly do not change behavior if not giving the new flag.
* I think I missed some review comments from v1, but I'd like to get
  the current state out over the weekend, as a lot has changed so far.
  On Monday I'll go through the previous discussion with a comb to see
  if I missed something.
  
v1:
When working with submodules, nearly anytime after checking out
a different state of the projects, that has submodules changed
you'd run "git submodule update" with a current version of Git.

There are two problems with this approach:

* The "submodule update" command is dangerous as it
  doesn't check for work that may be lost in the submodule
  (e.g. a dangling commit).
* you may forget to run the command as checkout is supposed
  to do all the work for you.

Integrate updating the submodules into git checkout, with the same
safety promises that git-checkout has, i.e. not throw away data unless
asked to. This is done by first checking if the submodule is at the same
sha1 as it is recorded in the superproject. If there are changes we stop
proceeding the checkout just like it is when checking out a file that
has local changes.

The integration happens in the code that is also used in other commands
such that it will be easier in the future to make other commands aware
of submodule.

This also solves d/f conflicts in case you replace a file/directory
with a submodule or vice versa.

The patches are still a bit rough, but the overall series seems
promising enough to me that I want to put it out here.

Any review, specifically on the design level welcome!

Thanks,
Stefan


Stefan Beller (14):
  lib-submodule-update.sh: reorder create_lib_submodule_repo
  lib-submodule-update.sh: define tests for recursing into submodules
  make is_submodule_populated gently
  connect_work_tree_and_git_dir: safely create leading directories
  update submodules: add submodule config parsing
  update submodules: add a config option to determine if submodules are
updated
  update submodules: introduce is_interesting_submodule
  update submodules: move up prepare_submodule_repo_env
  update submodules: add submodule_go_from_to
  unpack-trees: pass old oid to verify_clean_submodule
  unpack-trees: check if we can perform the

[PATCH 09/18] update submodules: add a config option to determine if submodules are updated

In later patches we introduce the options and flag for commands
that modify the working directory, e.g. git-checkout.

Have a central place to store such settings whether we want to update
a submodule.

Signed-off-by: Stefan Beller 
---
 submodule.c | 6 ++
 submodule.h | 1 +
 2 files changed, 7 insertions(+)

diff --git a/submodule.c b/submodule.c
index 04d185738f..591f4a694e 100644
--- a/submodule.c
+++ b/submodule.c
@@ -17,6 +17,7 @@
 #include "worktree.h"
 
 static int config_fetch_recurse_submodules = RECURSE_SUBMODULES_ON_DEMAND;
+static int config_update_recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
 static int parallel_jobs = 1;
 static struct string_list changed_submodule_paths = STRING_LIST_INIT_NODUP;
 static int initialized_fetch_ref_tips;
@@ -542,6 +543,11 @@ void set_config_fetch_recurse_submodules(int value)
config_fetch_recurse_submodules = value;
 }
 
+void set_config_update_recurse_submodules(int value)
+{
+   config_update_recurse_submodules = value;
+}
+
 static int has_remote(const char *refname, const struct object_id *oid,
  int flags, void *cb_data)
 {
diff --git a/submodule.h b/submodule.h
index 0b915bd3ac..b4e60c08d2 100644
--- a/submodule.h
+++ b/submodule.h
@@ -64,6 +64,7 @@ extern void show_submodule_inline_diff(FILE *f, const char 
*path,
const char *del, const char *add, const char *reset,
const struct diff_options *opt);
 extern void set_config_fetch_recurse_submodules(int value);
+extern void set_config_update_recurse_submodules(int value);
 extern void check_for_new_submodule_commits(unsigned char new_sha1[20]);
 extern int fetch_populated_submodules(const struct argv_array *options,
   const char *prefix, int command_line_option,
-- 
2.12.0.rc1.52.ge239d7e709.dirty

Re: [PATCH] blame: draft of line format

2017-03-06 Thread Edmundo Carmona Antoranz

On Tue, Jan 31, 2017 at 1:41 PM, Jeff King  wrote:
> On Mon, Jan 30, 2017 at 08:28:30PM -0600, Edmundo Carmona Antoranz wrote:
>
>> +static void pretty_info(char* revid, struct blame_entry *ent, struct strbuf 
>> *rev_buffer)
>> +{
>> + struct pretty_print_context ctx = {0};
>> + struct rev_info rev;
>> +
>> + struct strbuf format = STRBUF_INIT;
>> + strbuf_addstr(, format_line);
>> + ctx.fmt = CMIT_FMT_USERFORMAT;
>> + get_commit_format(format.buf, );
>> + pretty_print_commit(, ent->suspect->commit, rev_buffer);
>> + strbuf_release();
>> +}
>
> I think this may be less awkward if you use format_commit_message() as
> the entry point. Then you do not need a rev_info struct at all, it
> touches fewer global variables, etc.
>
> I don't know if that would cause the other difficulties you mentioned,
> though.
>
> -Peff

Thanks for the tip, Peff. It made the code to get rev info much
shorter. I'll work on some other improvements and then I'll send
another patch.

Best regards!

Re: [Request for Documentation] Differentiate signed (commits/tags/pushes)

On Mon, Mar 6, 2017 at 4:08 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>>> "tag -s" also has the benefit of being retroactive.  You can create
>>> commit, think about it for a week and then later tag it.  And ask
>>> others to also tag the same one.  You cannot do so with "commit -s".
>>
>> ok, so there is *no* advantage of signing a commit over tags?
>
> Did I say anything that remotely resembles that?  Puzzled.

Well that was brain having a short circuit.

>
> If the reason you want to have GPG signature on a commit is not
> because you want to mark some meaningful place in the history, but
> you are signing each and every ones out of some random reason,

and I am looking for these "some random reason"s.
If it is e.g. a ISO9001 requirement, I'll happily accept that as such.

By signing things, you certify your intent, i.e. by signing a commit,
you certify that you intent to create the commit as-is in some repository
on some branch (unlike the push certificate that specifies the repo and
branch).

> there
> is no reason why you would want "tag -s" them, so you can see it as
> an advantage of "commit -s" over "tag -s", because to such a
> project, all commits that are not tagged look the same and there is
> no "landmark" value to use "tag -s" for each and every one of them.

Okay. They are two different things, but to me they seem to archive
the same thing, with a tag having more niceties provided.
e.g. when you make a new release, you could just bump the version
in the versions file and sign the commit. As the commit is part of the
master branch it would not get lost.

The formerly mentioned "not polluting the refs/tags namespace"
is applicable to mergetags, that are a side tangent to signing
the commit vs creating a tag?

Now as Jakub mentions that signed commits came before the
mergetags were introduced, the existence of signed commits
sort of makes sense, as they were there first, but now are
superseded by more powerful tools.

> It is entirely reasonable to sign a merge commit that merges a
> signed tag.  They serve two different and unrelated purposes.

The signed tag that gets merged certifies the intent of the lieutenant
to ask for this specific content to be pulled and integrated, whereas
the signing of the commit certifies that the integrator intends to create
the merge commit as-is and e.g. resolve the merge conflicts as recorded.

Thanks,
Stefan

Re: [Request for Documentation] Differentiate signed (commits/tags/pushes)

Junio C Hamano  writes:

> Stefan Beller  writes:
>
>> What is the difference between signed commits and tags?
>> (Not from a technical perspective, but for the end user)
> ...
>> A signed push can certify that a given payload (consisting
>> of multiple commits on possibly multiple branches) was transmitted
>> to a remote, which can be recorded by the remote as e.g. a proof
>> of work.
> ...
> A signed push is _NOT_ about certifying the objects in the history
> DAG.  It is about certifying the _intent_ of pointing _REFS_ into
> points in the object graph.
> ...
> Historically, "tag -s" came a lot earlier.  When a project for
> whatever reason wants signature for each and every commit so that
> they somehow can feel good, without "commit -s", it would have made
> us unnecessary work to scale tag namespace only because there will
> be tons of pointless tags.  "commit -s" was a remedy for that.

While we are enumerating them, it is worth mentioning the mergetag
header of a commit object.

This is added to a (merge) commit object when you merged a signed
tag that points at a commit, and the intent is to eliminate the need
to _keep_ the ref around that is created only for the purpose of
"please pull from me, I tagged and signed the tip of the history I
want you to pull" request.  From that point of view, you could say
it is also reducing the load on refs/tags/ namespace, but more
importantly by not requiring the ref around, it allows you to verify
that the merge commit merged the correct tag with _only_ the commit
object by reproducing the payload of the signed tag that was merged
in the commit object in full.

[PATCH 06/18] make is_submodule_populated gently

We need the gentle version in a later patch. As we have just one caller,
migrate the caller.

Signed-off-by: Stefan Beller 
---
 builtin/grep.c | 2 +-
 submodule.c| 7 ++-
 submodule.h| 8 +++-
 3 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 2c727ef499..b17835aed6 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -616,7 +616,7 @@ static int grep_submodule(struct grep_opt *opt, const 
unsigned char *sha1,
 {
if (!is_submodule_initialized(path))
return 0;
-   if (!is_submodule_populated(path)) {
+   if (!is_submodule_populated_gently(path, NULL)) {
/*
 * If searching history, check for the presense of the
 * submodule's gitdir before skipping the submodule.
diff --git a/submodule.c b/submodule.c
index 3b98766a6b..0e55372f37 100644
--- a/submodule.c
+++ b/submodule.c
@@ -234,15 +234,12 @@ int is_submodule_initialized(const char *path)
return ret;
 }
 
-/*
- * Determine if a submodule has been populated at a given 'path'
- */
-int is_submodule_populated(const char *path)
+int is_submodule_populated_gently(const char *path, int *return_error_code)
 {
int ret = 0;
char *gitdir = xstrfmt("%s/.git", path);
 
-   if (resolve_gitdir(gitdir))
+   if (resolve_gitdir_gently(gitdir, return_error_code))
ret = 1;
 
free(gitdir);
diff --git a/submodule.h b/submodule.h
index 05ab674f06..0b915bd3ac 100644
--- a/submodule.h
+++ b/submodule.h
@@ -41,7 +41,13 @@ extern int submodule_config(const char *var, const char 
*value, void *cb);
 extern void gitmodules_config(void);
 extern void gitmodules_config_sha1(const unsigned char *commit_sha1);
 extern int is_submodule_initialized(const char *path);
-extern int is_submodule_populated(const char *path);
+/*
+ * Determine if a submodule has been populated at a given 'path' by checking if
+ * the /.git resolves to a valid git repository.
+ * If return_error_code is NULL, die on error.
+ * Otherwise the return error code is the same as of resolve_gitdir_gently.
+ */
+extern int is_submodule_populated_gently(const char *path, int 
*return_error_code);
 extern int parse_submodule_update_strategy(const char *value,
struct submodule_update_strategy *dst);
 extern const char *submodule_strategy_to_string(const struct 
submodule_update_strategy *s);
-- 
2.12.0.rc1.52.ge239d7e709.dirty

Re: [PATCH v5 1/1] config: add conditional include

Being late to the review party here.

> +static int include_condition_is_true(const char *cond, size_t cond_len)
> +{
...
> +
> +   error(_("unrecognized include condition: %.*s"), (int)cond_len, cond);
> +   /* unknown conditionals are always false */
> +   return 0;
> +}

Thanks for putting an error message here. I was looking at what
is currently queued as origin/nd/conditional-config-include,
which doesn't have this error()  (yet / not any more?)

I'd strongly suggest to keep the error message here as that way
a user can diagnose e.g. a typo in the condition easily.

If we plan to extend this list of conditions in the future, and a user
switches between versions of git, then they may see this message
on a regular basis (whenever they use the 'old' version).

In that case it may be only a warning() instead of an error(),
but I have no strong opinion on that.

---
Reason why I am reviewing this series now:

I thought about extending the config system for submodule
usage (see debate at [1]).

My gut reaction was to have a condition for "if a superproject
exists" and then include a special config (e.g. "config.super"
in the superprojects $GIT_DIR).

However while reviewing these patches I realized
I am not interested in conditional includes, but when setting
up the submodules we know for a fact that we have a superproject,
so no conditional needed. Instead we need a special markup
of paths, i.e. we want to have an easy way to say
"$GIT_DIR of superproject". Ideas how to do that?

Thanks,
Stefan

[1] 
https://public-inbox.org/git/84fcb0bd-85dc-0142-dd58-47a04eaa7...@durchholz.org/

Re: [Request for Documentation] Differentiate signed (commits/tags/pushes)

Jakub Narębski  writes:

> Also from what I remember signed commits came before mergetags, that
> is the result of merging a signed tag (storing the signature of
> one of parents of the merge commit to not pollute tag namespace).
>
> And this workflow, from what I know, is quite useful.

The "commit -s" on a merge commit lets you as the integrator to
attest that you made that merge.  The "mergetag" records the
signature by the contributor that says the tip that was merged was
what the contributor wanted to get merged.  

It is entirely reasonable to sign a merge commit that merges a
signed tag.  They serve two different and unrelated purposes.

Re: RFC: Another proposed hash function transition plan

2017-03-06 Thread Mike Hommey

On Mon, Mar 06, 2017 at 03:40:30PM -0800, Jonathan Nieder wrote:
> David Lang wrote:
> 
> >> Translation table
> >> ~
> >> A fast bidirectional mapping between sha1-names and sha256-names of
> >> all local objects in the repository is kept on disk. The exact format
> >> of that mapping is to be determined.
> >>
> >> All operations that make new objects (e.g., "git commit") add the new
> >> objects to the translation table.
> >
> > This seems like a rather nontrival thing to design. It will need to
> > hold millions of mappings, and be quickly searchable from either
> > direction (sha1->new and new->sha1) while still be fairly fast to
> > insert new records into.
> 
> I am currently thinking of using LevelDB, since it has the advantages of
> being simple, already existing, and having already been ported to Java
> (allowing JGit can read and write the same format).
> 
> If that doesn't work, we'd try some other key-value store like Samba's
> tdb or Kyoto Cabinet.

FWIW, I'm using notes-like data to store mercurial->git mappings in
git-cinnabar, (ab)using the commit type in tree items. It's fast enough.

Mike

RFC v3: Another proposed hash function transition plan

2017-03-06 Thread Jonathan Nieder

Linus Torvalds wrote:
> On Fri, Mar 3, 2017 at 5:12 PM, Jonathan Nieder  wrote:

>> This document is still in flux but I thought it best to send it out
>> early to start getting feedback.
>
> This actually looks very reasonable if you can implement it cleanly
> enough.

Thanks for the kind words on what had quite a few flaws still.  Here's
a new draft.  I think the next version will be a patch against
Documentation/technical/.

As before, comments welcome, both here and inline at

  https://goo.gl/gh2Mzc

Changes since v2:

Use SHA3-256 instead of SHA2 (thanks, Linus and brian m.
carlson).[1][2]

Make sha3-based signatures a separate field, avoiding the need for
"hash" and "nohash" fields (thanks to peff[3]).

Add a sorting phase to fetch (thanks to Junio for noticing the need
for this).

Omit blobs from the topological sort during fetch (thanks to peff).

Discuss alternates, git notes, and git servers in the caveats section
(thanks to Junio Hamano, brian m. carlson[4], and Shawn Pearce).

Clarify language throughout (thanks to various commenters, especially
Junio).

Sincerely,
Jonathan

Git hash function transition

Status: Draft
Last Updated: 2017-03-06

Objective
-
Migrate Git from SHA-1 to a stronger hash function.

Background
--
At its core, the Git version control system is a content addressable
filesystem. It uses the SHA-1 hash function to name content. For
example, files, directories, and revisions are referred to by hash
values unlike in other traditional version control systems where files
or versions are referred to via sequential numbers. The use of a hash
function to address its content delivers a few advantages:

* Integrity checking is easy. Bit flips, for example, are easily
  detected, as the hash of corrupted content does not match its name.
* Lookup of objects is fast.

Using a cryptographically secure hash function brings additional
advantages:

* Object names can be signed and third parties can trust the hash to
  address the signed object and all objects it references.
* Communication using Git protocol and out of band communication
  methods have a short reliable string that can be used to reliably
  address stored content.

Over time some flaws in SHA-1 have been discovered by security
researchers. https://shattered.io demonstrated a practical SHA-1 hash
collision. As a result, SHA-1 cannot be considered cryptographically
secure any more. This impacts the communication of hash values because
we cannot trust that a given hash value represents the known good
version of content that the speaker intended.

SHA-1 still possesses the other properties such as fast object lookup
and safe error checking, but other hash functions are equally suitable
that are believed to be cryptographically secure.

Goals
-
1. The transition to SHA3-256 can be done one local repository at a time.
   a. Requiring no action by any other party.
   b. A SHA3-256 repository can communicate with SHA-1 Git servers
  (push/fetch).
   c. Users can use SHA-1 and SHA3-256 identifiers for objects
  interchangeably.
   d. New signed objects make use of a stronger hash function than
  SHA-1 for their security guarantees.
2. Allow a complete transition away from SHA-1.
   a. Local metadata for SHA-1 compatibility can be removed from a
  repository if compatibility with SHA-1 is no longer needed.
3. Maintainability throughout the process.
   a. The object format is kept simple and consistent.
   b. Creation of a generalized repository conversion tool.

Non-Goals
-
1. Add SHA3-256 support to Git protocol. This is valuable and the
   logical next step but it is out of scope for this initial design.
2. Transparently improving the security of existing SHA-1 signed
   objects.
3. Intermixing objects using multiple hash functions in a single
   repository.
4. Taking the opportunity to fix other bugs in git's formats and
   protocols.
5. Shallow clones and fetches into a SHA3-256 repository. (This will
   change when we add SHA3-256 support to Git protocol.)
6. Skip fetching some submodules of a project into a SHA3-256
   repository. (This also depends on SHA3-256 support in Git
   protocol.)

Overview

We introduce a new repository format extension `sha3`. Repositories
with this extension enabled use SHA3-256 instead of SHA-1 to name
their objects. This affects both object names and object content ---
both the names of objects and all references to other objects within
an object are switched to the new hash function.

sha3 repositories cannot be read by older versions of Git.

Alongside the packfile, a sha3 repository stores a bidirectional
mapping between sha3 and sha1 object names. The mapping is generated
locally and can be verified using "git fsck". Object lookups use this
mapping to allow naming objects using either their sha1 and sha3 names
interchangeably.

"git cat-file" and "git hash-object" gain options to display an

Re: [Request for Documentation] Differentiate signed (commits/tags/pushes)

Stefan Beller  writes:

>> "tag -s" also has the benefit of being retroactive.  You can create
>> commit, think about it for a week and then later tag it.  And ask
>> others to also tag the same one.  You cannot do so with "commit -s".
>
> ok, so there is *no* advantage of signing a commit over tags?

Did I say anything that remotely resembles that?  Puzzled.

If the reason you want to have GPG signature on a commit is not
because you want to mark some meaningful place in the history, but
you are signing each and every ones out of some random reason, there
is no reason why you would want "tag -s" them, so you can see it as
an advantage of "commit -s" over "tag -s", because to such a
project, all commits that are not tagged look the same and there is
no "landmark" value to use "tag -s" for each and every one of them.

Re: [Request for Documentation] Differentiate signed (commits/tags/pushes)

2017-03-06 Thread Jakub Narębski

W dniu 06.03.2017 o 23:13, Junio C Hamano pisze:
> Stefan Beller  writes:
> 
>> What is the difference between signed commits and tags?
>> (Not from a technical perspective, but for the end user)
[...]
>> Off list I was told gpg-signed commits are a "checkbox feature",
>> i.e. no real world workflow would actually use it. (That's a bold
>> statement, someone has to use it as there was enough interest
>> to implement it, no?)
> 
> I'd agree with that "checkbox" description, except that you need to
> remember that a project can enforce _any_ workflow to its developer,
> even if it does not make much sense, and at that point, the workflow
> would become a real-world workflow.  The word "real world workflow"
> does not make any assurance if that workflow is sensible.
> 
> Historically, "tag -s" came a lot earlier.  When a project for
> whatever reason wants signature for each and every commit so that
> they somehow can feel good, without "commit -s", it would have made
> us unnecessary work to scale tag namespace only because there will
> be tons of pointless tags.  "commit -s" was a remedy for that.

Also from what I remember signed commits came before mergetags, that
is the result of merging a signed tag (storing the signature of
one of parents of the merge commit to not pollute tag namespace).

And this workflow, from what I know, is quite useful.

-- 
Jakub Narębski

Re: RFC: Another proposed hash function transition plan

2017-03-06 Thread Jonathan Nieder

David Lang wrote:

>> Translation table
>> ~
>> A fast bidirectional mapping between sha1-names and sha256-names of
>> all local objects in the repository is kept on disk. The exact format
>> of that mapping is to be determined.
>>
>> All operations that make new objects (e.g., "git commit") add the new
>> objects to the translation table.
>
> This seems like a rather nontrival thing to design. It will need to
> hold millions of mappings, and be quickly searchable from either
> direction (sha1->new and new->sha1) while still be fairly fast to
> insert new records into.

I am currently thinking of using LevelDB, since it has the advantages of
being simple, already existing, and having already been ported to Java
(allowing JGit can read and write the same format).

If that doesn't work, we'd try some other key-value store like Samba's
tdb or Kyoto Cabinet.

Jonathan

[RFC PATCH] grep: fix bug when recursing with relative pathspec

When using the --recurse-submodules flag with a relative pathspec which
includes "..", an error is produced inside the child process spawned for
a submodule.  When creating the pathspec struct in the child, the ".."
is interpreted to mean "go up a directory" which causes an error stating
that the path ".." is outside of the repository.

While it is true that ".." is outside the scope of the submodule, it is
confusing to a user who originally invoked the command where ".." was
indeed still inside the scope of the superproject.  Since the child
process launched for the submodule has some context that it is operating
underneath a superproject, this error could be avoided.

This patch fixes the bug by passing the 'prefix' to the child process.
Now each child process that works on a submodule has two points of
reference to the superproject: (1) the 'super_prefix' which is the path
from the root of the superproject down to root of the submodule and (2)
the 'prefix' which is the path from the root of the superproject down to
the directory where the user invoked the git command.

With these two pieces of information a child process can correctly
interpret the pathspecs provided by the user as well as being able to
properly format the its output relative to the directory the user
invoked a git command from.

Signed-off-by: Brandon Williams 
---

After taking a closer look at this bug I determined that I did a poor job at
thinking of all the corner cases when implementing a recursive grep.  As it
turns out I think we really need to pass on the prefix information to the child
process so that it has enough context to do path matching and to format its
output.  Unfortunately I don't know the best way to pass on this information
without breaking other commands that rely on the GIT_PREFIX being overridden
during discovery.  Maybe we'll need to add in another env var to account for
that?  For the purposes of showing how to fix this bug while still getting the
test suite to pass, I have git respect the GIT_PREFIX env var only if the
super_prefix is set (which it should only be used in submodule operations at
this point in time).  Its a pretty hacky fix, so I'm up for other suggestions
on how to properly pass on this information to the child process.


 builtin/grep.c | 39 
 git.c  |  2 --
 setup.c|  5 +++
 t/t7814-grep-recurse-submodules.sh | 73 ++
 4 files changed, 102 insertions(+), 17 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index 2c727ef49..acecad0e0 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -310,10 +310,7 @@ static int grep_sha1(struct grep_opt *opt, const unsigned 
char *sha1,
 {
struct strbuf pathbuf = STRBUF_INIT;
 
-   if (opt->relative && opt->prefix_length) {
-   quote_path_relative(filename + tree_name_len, opt->prefix, 
);
-   strbuf_insert(, 0, filename, tree_name_len);
-   } else if (super_prefix) {
+   if (super_prefix) {
strbuf_add(, filename, tree_name_len);
strbuf_addstr(, super_prefix);
strbuf_addstr(, filename + tree_name_len);
@@ -321,6 +318,13 @@ static int grep_sha1(struct grep_opt *opt, const unsigned 
char *sha1,
strbuf_addstr(, filename);
}
 
+   if (opt->relative && opt->prefix_length) {
+   char *name = strbuf_detach(, NULL);
+   quote_path_relative(name + tree_name_len, opt->prefix, 
);
+   strbuf_insert(, 0, name, tree_name_len);
+   free(name);
+   }
+
 #ifndef NO_PTHREADS
if (num_threads) {
add_work(opt, GREP_SOURCE_SHA1, pathbuf.buf, path, sha1);
@@ -345,12 +349,14 @@ static int grep_file(struct grep_opt *opt, const char 
*filename)
 {
struct strbuf buf = STRBUF_INIT;
 
+   if (super_prefix)
+   strbuf_addstr(, super_prefix);
+   strbuf_addstr(, filename);
+
if (opt->relative && opt->prefix_length) {
-   quote_path_relative(filename, opt->prefix, );
-   } else {
-   if (super_prefix)
-   strbuf_addstr(, super_prefix);
-   strbuf_addstr(, filename);
+   char *name = strbuf_detach(, NULL);
+   quote_path_relative(name, opt->prefix, );
+   free(name);
}
 
 #ifndef NO_PTHREADS
@@ -399,13 +405,12 @@ static void run_pager(struct grep_opt *opt, const char 
*prefix)
 }
 
 static void compile_submodule_options(const struct grep_opt *opt,
- const struct pathspec *pathspec,
+ const char **argv,
  int cached, int untracked,
  int opt_exclude, int use_index,
  int pattern_type_arg)
 {
struct grep_pat *pattern;
-

Re: [Request for Documentation] Differentiate signed (commits/tags/pushes)

On Mon, Mar 6, 2017 at 2:13 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> What is the difference between signed commits and tags?
>> (Not from a technical perspective, but for the end user)
>
> When you "commit -s", you are signing the bytes in the commit
> object, which means that you are attesting the fact that the tree
> you wanted to record is one of the 47 other colliding tree objects
> that happen to share that 40-hex hash value, and also the fact that
> the commits you wanted to record as its parents have certain SHA-1
> hash values.  As you are relying on the resistance to preimage
> attack against SHA-1 at least locally around that signed commit,
> there wouldn't be meaningful difference between a 50-commit series
> each of which is individually signed with "commit -s", such a
> 50-commit series, only the top of which is signed with "commit -s",
> and the same 50-commit series, on the top of which is signed with
> "tag -s".
>
> "tag -s" also has the benefit of being retroactive.  You can create
> commit, think about it for a week and then later tag it.  And ask
> others to also tag the same one.  You cannot do so with "commit -s".

ok, so there is *no* advantage of signing a commit over tags?
I'll see if I can write a patch that enhances Documentation/git-commit.txt
pointing to git-tag instead.

>> A signed push can certify that a given payload (consisting
>> of multiple commits on possibly multiple branches) was transmitted
>> to a remote, which can be recorded by the remote as e.g. a proof
>> of work.
>
> A signed push is _NOT_ about certifying the objects in the history

Yes that is my understanding, though I was unclear in writing it.

> I'd agree with that "checkbox" description, [...]
>  "commit -s" was a remedy for that.

Out of curiosity: Does (did) such a project exist? Can I read up on that
and their best practices?

Thanks,
Stefan

Re: Server-side hooks on non-bare repository

2017-03-06 Thread Mike Lewis

Hi Junio,

Thanks for taking the time to reply. I apologize if there was a 
misunderstanding in my previous email; I'm relatively new to dealing with some 
of the more advanced features of git, and did not describe my situation as 
clearly as possible.

I am using the pre-receive hook to determine whether to allow the push or not. 
Essentially, I'm rejecting the entire push if a branch not checked out is 
pushed, and then ensuring that a backup has been made of some critical data to 
prevent any issues with deployment (and if that backup fails for whatever 
reason, I also reject the push). The post-receive hook is essentially used for 
notification of a couple services that the push had been completed.

My point #1 was supposed to be something along the lines of "I found this 
behavior unintuitive and poorly documented, but it's fine since I can work 
around it", but it didn't come across that way. I understand that breaking 
backwards compatibility by changing this behavior would be a huge deal, and I 
was not intending to suggest it. 

Point #2, however is what I was attempting to discuss in more detail. I'm 
curious as to why git doesn't recognize that it's currently inside the working 
tree in that situation, and if that's intended or not. There's no update 
currently happening in either hook (since they are called immediately before 
and after any changes), which means that I can't see a reason to have behavior 
that differs (having to specify a GIT_DIR, which may or may not be a complete 
workaround) between when the hooks are running and when they're not. That 
doesn't mean there isn't a reason; it's more likely I'm just not familiar 
enough with how the hooks are called internally to see it, and am looking for a 
more thorough explanation as to the motivations behind this behavior. If it's 
intended and not going to change, I think the behavior should be documented 
more thoroughly so that nobody has to repeat the process I did to find out why 
my hooks were failing. I don't mind contributing to this, but I have no idea 
where to start. 

Anyways, thanks for responding, and I apologize again if my first email was 
poorly written and rushed. 

Mike Lewis

> On Mar 6, 2017, at 16:42, Junio C Hamano  wrote:
> 
> Mike Lewis  writes:
> 
>> I’m having some issues with using server-side hooks when pushing
>> to a non-bare repository. In my git config, I have
>> `receive.denyCurrentBranch` set to `updateInstead`, which behaves
>> as expected, and updates the current working tree when the current
>> branch is pushed to. However, attempting to process those changes
>> with pre-receive and post-receive hooks results in some unexpected
>> behavior regarding the current working directory of the scripts
>> and using git commands.
> 
> The pre-receive hook is to inspect the objects and ref updates and
> say "yes, allow it" or "no, refuse it"---you are not supposed to do
> anything else, so even though what processing you are interested in
> doing in your "attempting to process" is unclear, this hook is not
> what you want to use anyway.
> 
> The post-receive hook is a more interesting case.  It is called only
> after everything finishes, so it is like running a custom script
> after "git push" is processed.
> 
>> I’ve tested these issues using both git
>> 2.11 and 2.12 on various systems (macOS and CentOS), and get the
>> same behavior each time.
> 
> That is a very good news, as I do not think at least in the past few
> years we planned to change the established behaviour of the hook.
> 
>> 1. When using a non-bare repository, I would expect the the
>> working directory of the hook to be the root directory of the
>> working tree,...
>> 2. While running the hooks, git treats the repository as being
>> bare, regardless of whether it actually is.
> 
> Yes, and it is unlikely that the behaviour wrt to where $cwd is
> during the hook's execution will ever change; otherwise existing
> scripts that know what the rule is (i.e. the rule you figured out in
> 2.) will be broken.  denycurrent=updateinstead *is* an odd-man out,
> and its processing is purely internal---its addition does not mean
> hook authors are suddenly required to do things differently
> depending on bare/non-bare-ness of the repository.
> 
>

Re: [PATCH 03/18] lib-submodule-update: teach test_submodule_content the -C flag

On Wed, Mar 1, 2017 at 6:11 PM, Eric Wong  wrote:
> Stefan Beller  wrote:
>>  test_submodule_content () {
>> + if test "$1" == "-C"
>
> Use a single '=' for portability in sh.  It's also a good idea
> to prefix variables with 'x' or some such, since "$1" could be
> "-n" or some other supported switch for test(1).
> So, something like:
>
> if test x"$1" = "x-C"
>
> ...or use a case statement.
>
> On Debian systems, I use the "checkbashisms" from the
> "devscripts" package to find and avoid bash dependencies.

Thanks for review as well as the tip for using
checkbashisms,

I'll resend this series with this fix as well as another fix
in git-submodule.sh

Thanks,
Stefan

Re: What's cooking in git.git (Mar 2017, #02; Fri, 3)

Lars Schneider  writes:

>> On 04 Mar 2017, at 00:26, Junio C Hamano  wrote:
>> 
>> 
>> * ls/filter-process-delayed (2017-01-08) 1 commit
>> . convert: add "status=delayed" to filter process protocol
>> 
>> Ejected, as does not build when merged to 'pu'.
>
> I send v2 [1] where I tried to address the points in your feedback [2].

Ah, I took a look at it back then and then forgot about it.  I'll
try to see if I can replace the stale one I have and merge it to
'pu'.

> v2 not the final roll. My goal for v2 is to get the interface
> to convert.h right.

You sound like you are trying to make the interface to the callers
finalized before doing further work, but after re-reading the
exchange between you and Peff in that thread [*1*], I am not sure
that is feasible to begin with.  

For example, your async_convert_to_working_tree() returns Success or
Delayed [*2*] and the callers of write_entry() cannot tell which the
paths on the filesystem needs a call to checkout_delayed_entries()
to finish them before they can safely tell the outside world that
these paths are safe to use.  

It seems to me that any caller that calls checkout_entry() needs to
essentially do:

- compute which entries in the index need to be checked out
  to the filesystem;

- for each of them:
- call checkout_entry()

- call checkout_delayed_entries(), because among the
  checkout_entry() calls we did in the above loop, some of
  them may have "delayed", but we do not know which one(s).

Output from "git grep -e checkout_delayed_entries -e checkout_entry"
seems to tell me that at least builtin/apply.c and
builtin/checkout-index.c forget the last step.

I'd understand the design better if the delayed-item list were a
part of the "struct checkout" state structure, and write_entry(),
when delaying the write, returned enough information (not just "this
has been delayed") that can be used to later instruct the
long-running filter process things like "you gave me this 'delayed'
token earlier; I want the result for it now!", "are you finished
processing my earlier request, to which you gave me this 'delayed'
token?", etc.  One of these instructions could be "here is the
path. write the result out for the earlier request of mine you gave
me this 'delayed' token for.  I do not need the result in-core.  And
take as much time as you need--I do not mind blocking here at this
point."  In a future, a new instruction may be added to ask "please
give me the result in-core, as if you returned the result to my
earlier original async_convert_to_working_tree() call without
delaying the request."

Within such a framework, your checkout_delayed_entries() would be a
special case for finalizing a "struct checkout" that has been in
use.  By enforcing that any "struct checkout" begins its life by a
"state = CHECKOUT_INIT" initialization and finishes its life by a
"finish_checkout()" call, we will reduce risks to forget
making necessary call to checkout_delayed_entries(), I would think.

[References and Footnotes]

*1* http://public-inbox.org/git/20170226184816.30010-1-larsxschnei...@gmail.com/

*2* By the way, the code in write_entry() should have a final else
clause that diagnoses an error return from
async_convert_to_working_tree() and act on it---an unexpected return
will fall through to the code that opens output fd and

Re: [PATCH 03/18] lib-submodule-update: teach test_submodule_content the -C flag

Stefan Beller  writes:

> Signed-off-by: Stefan Beller 
> ---
>  t/lib-submodule-update.sh | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
> index c0d6325133..00128f28b5 100755
> --- a/t/lib-submodule-update.sh
> +++ b/t/lib-submodule-update.sh
> @@ -193,6 +193,11 @@ test_superproject_content () {
>  # Test that the given submodule at path "$1" contains the content according
>  # to the submodule commit recorded in the superproject's commit "$2"
>  test_submodule_content () {
> + if test x"$1" = "x-C"
> + then
> + cd "$2"
> + shift; shift;

That's old fashoned like me ;-)  

It seems "shift []" is already used elsewhere in our test scripts
without getting complaints.

'git-difftool-helper.sh' is the only one that uses number above 1 in
the scripted Porcelains, so it is possible that an esoteric platform
without a shell that understands it may be built and used _without_
passing our test suite, so I'd be wary of using it in the scripted
Porcelains, but this is part of the test suite, so I would think it
is OK.

> + fi
>   if test $# != 2
>   then
>   echo "test_submodule_content needs two arguments"

Re: [Request for Documentation] Differentiate signed (commits/tags/pushes)

Stefan Beller  writes:

> What is the difference between signed commits and tags?
> (Not from a technical perspective, but for the end user)

When you "commit -s", you are signing the bytes in the commit
object, which means that you are attesting the fact that the tree
you wanted to record is one of the 47 other colliding tree objects
that happen to share that 40-hex hash value, and also the fact that
the commits you wanted to record as its parents have certain SHA-1
hash values.  As you are relying on the resistance to preimage
attack against SHA-1 at least locally around that signed commit,
there wouldn't be meaningful difference between a 50-commit series
each of which is individually signed with "commit -s", such a
50-commit series, only the top of which is signed with "commit -s",
and the same 50-commit series, on the top of which is signed with
"tag -s".

"tag -s" also has the benefit of being retroactive.  You can create
commit, think about it for a week and then later tag it.  And ask
others to also tag the same one.  You cannot do so with "commit -s".

> A signed push can certify that a given payload (consisting
> of multiple commits on possibly multiple branches) was transmitted
> to a remote, which can be recorded by the remote as e.g. a proof
> of work.

A signed push is _NOT_ about certifying the objects in the history
DAG.  It is about certifying the _intent_ of pointing _REFS_ into
points in the object graph.  "This is a commit I made to add feature
frotz" is something you might say with "commit -s" and "these
commits behind this point are for upcoming 2.13 release" is
something you might say with "tag -s v2.13-rc0".  But "I made it"
and "I made it for this purpose" are different things.  I may not
want the "feature frotz" commit included in the maintenance track,
so it would be a mistake for push a history that contains it to
update refs/heads/maint ref.  A push certificate can protect hosting
sites like GitHub, when I complain to them saying "you guys are
pointing at a wrong commit with refs/heads/maint", by allowing them
to respond with "well, you made the push to perform that update and
here is what you GPG signed".

> Off list I was told gpg-signed commits are a "checkbox feature",
> i.e. no real world workflow would actually use it. (That's a bold
> statement, someone has to use it as there was enough interest
> to implement it, no?)

I'd agree with that "checkbox" description, except that you need to
remember that a project can enforce _any_ workflow to its developer,
even if it does not make much sense, and at that point, the workflow
would become a real-world workflow.  The word "real world workflow"
does not make any assurance if that workflow is sensible.

Historically, "tag -s" came a lot earlier.  When a project for
whatever reason wants signature for each and every commit so that
they somehow can feel good, without "commit -s", it would have made
us unnecessary work to scale tag namespace only because there will
be tons of pointless tags.  "commit -s" was a remedy for that.

[PATCH 08/18] update submodules: add submodule config parsing

Similar to b33a15b08 (push: add recurseSubmodules config option,
2015-11-17) and 027771fcb1 (submodule: allow erroneous values for the
fetchRecurseSubmodules option, 2015-08-17), we add submodule-config code
that is later used to parse whether we are interested in updating
submodules.

We need the `die_on_error` parameter to be able to call this parsing
function for the config file as well, which if incorrect lets Git die.

As we're just touching the header file, also mark all functions extern.

Signed-off-by: Stefan Beller 
---
 submodule-config.c | 20 
 submodule-config.h | 17 +
 2 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/submodule-config.c b/submodule-config.c
index 93453909cf..3e8e380d98 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -234,6 +234,26 @@ int parse_fetch_recurse_submodules_arg(const char *opt, 
const char *arg)
return parse_fetch_recurse(opt, arg, 1);
 }
 
+static int parse_update_recurse(const char *opt, const char *arg,
+   int die_on_error)
+{
+   switch (git_config_maybe_bool(opt, arg)) {
+   case 1:
+   return RECURSE_SUBMODULES_ON;
+   case 0:
+   return RECURSE_SUBMODULES_OFF;
+   default:
+   if (die_on_error)
+   die("bad %s argument: %s", opt, arg);
+   return RECURSE_SUBMODULES_ERROR;
+   }
+}
+
+int parse_update_recurse_submodules_arg(const char *opt, const char *arg)
+{
+   return parse_update_recurse(opt, arg, 1);
+}
+
 static int parse_push_recurse(const char *opt, const char *arg,
   int die_on_error)
 {
diff --git a/submodule-config.h b/submodule-config.h
index 70f19363fd..d434ecdb45 100644
--- a/submodule-config.h
+++ b/submodule-config.h
@@ -22,16 +22,17 @@ struct submodule {
int recommend_shallow;
 };
 
-int parse_fetch_recurse_submodules_arg(const char *opt, const char *arg);
-int parse_push_recurse_submodules_arg(const char *opt, const char *arg);
-int parse_submodule_config_option(const char *var, const char *value);
-const struct submodule *submodule_from_name(const unsigned char 
*commit_or_tree,
-   const char *name);
-const struct submodule *submodule_from_path(const unsigned char 
*commit_or_tree,
-   const char *path);
+extern int parse_fetch_recurse_submodules_arg(const char *opt, const char 
*arg);
+extern int parse_update_recurse_submodules_arg(const char *opt, const char 
*arg);
+extern int parse_push_recurse_submodules_arg(const char *opt, const char *arg);
+extern int parse_submodule_config_option(const char *var, const char *value);
+extern const struct submodule *submodule_from_name(
+   const unsigned char *commit_or_tree, const char *name);
+extern const struct submodule *submodule_from_path(
+   const unsigned char *commit_or_tree, const char *path);
 extern int gitmodule_sha1_from_commit(const unsigned char *commit_sha1,
  unsigned char *gitmodules_sha1,
  struct strbuf *rev);
-void submodule_free(void);
+extern void submodule_free(void);
 
 #endif /* SUBMODULE_CONFIG_H */
-- 
2.12.0.rc1.52.ge239d7e709.dirty

Re: RFC: Another proposed hash function transition plan

Linus Torvalds  writes:

> So *if* the new object format uses a git header line like
>
> "blob  \0"
>
> then it would inherently contain that mapping from 256-bit hash to the
> SHA1, but it would actually also protect against attacks on the new
> hash.

This is easy for blobs as you only need to hash twice.  I am not
sure if you can do the same for trees, though.  For that  to
be useful, the hash needs to be over the tree contents whose
references are expressed in , which in turn would mean...

... ah, you would read these  off of the object header in the
new world and you do not need to expand the whole thing.  OK, I see
how it could work.

> In fact, in particular for objects with internal format that
> differs between the two hashing models (ie trees and commits which to
> some degree are higher-value targets), it would make attacks really
> quite complicated, I suspect.
>
> And you wouldn't need those "hash" or "nohash" things at all. The old
> SHA1 would simply always be there, and cheap to look up (ie you
> wouldn't have to unpack the whole object).

Re: Server-side hooks on non-bare repository

Mike Lewis  writes:

> I’m having some issues with using server-side hooks when pushing
> to a non-bare repository. In my git config, I have
> `receive.denyCurrentBranch` set to `updateInstead`, which behaves
> as expected, and updates the current working tree when the current
> branch is pushed to. However, attempting to process those changes
> with pre-receive and post-receive hooks results in some unexpected
> behavior regarding the current working directory of the scripts
> and using git commands.

The pre-receive hook is to inspect the objects and ref updates and
say "yes, allow it" or "no, refuse it"---you are not supposed to do
anything else, so even though what processing you are interested in
doing in your "attempting to process" is unclear, this hook is not
what you want to use anyway.

The post-receive hook is a more interesting case.  It is called only
after everything finishes, so it is like running a custom script
after "git push" is processed.

> I’ve tested these issues using both git
> 2.11 and 2.12 on various systems (macOS and CentOS), and get the
> same behavior each time.

That is a very good news, as I do not think at least in the past few
years we planned to change the established behaviour of the hook.

> 1. When using a non-bare repository, I would expect the the
> working directory of the hook to be the root directory of the
> working tree,...
> 2. While running the hooks, git treats the repository as being
> bare, regardless of whether it actually is.

Yes, and it is unlikely that the behaviour wrt to where $cwd is
during the hook's execution will ever change; otherwise existing
scripts that know what the rule is (i.e. the rule you figured out in
2.) will be broken.  denycurrent=updateinstead *is* an odd-man out,
and its processing is purely internal---its addition does not mean
hook authors are suddenly required to do things differently
depending on bare/non-bare-ness of the repository.

[PATCH 18/18] builtin/read-tree: add --recurse-submodules switch

A new known failure mode is introduced[1], which is actually not
a failure but a feature in read-tree. Unlike checkout for which
the recursive submodule tests were originally written, read-tree does
warn about ignored untracked files that would be overwritten.
For the sake of keeping the test library for submodules genric, just
mark the test as a failure.

[1] KNOWN_FAILURE_SUBMODULE_OVERWRITE_IGNORED_UNTRACKED

Signed-off-by: Stefan Beller 
---
 Documentation/git-read-tree.txt |  6 ++
 builtin/read-tree.c | 29 +
 t/lib-submodule-update.sh   |  7 ++-
 t/t1013-read-tree-submodule.sh  |  7 +++
 4 files changed, 48 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-read-tree.txt b/Documentation/git-read-tree.txt
index fa1d557e5b..ed9d63ef4a 100644
--- a/Documentation/git-read-tree.txt
+++ b/Documentation/git-read-tree.txt
@@ -115,6 +115,12 @@ OPTIONS
directories the index file and index output file are
located in.
 
+--[no-]recurse-submodules::
+   Using --recurse-submodules will update the content of all initialized
+   submodules according to the commit recorded in the superproject by
+   calling read-tree recursively, also setting the submodules HEAD to be
+   detached at that commit.
+
 --no-sparse-checkout::
Disable sparse checkout support even if `core.sparseCheckout`
is true.
diff --git a/builtin/read-tree.c b/builtin/read-tree.c
index 8ba64bc785..2dc5cc06da 100644
--- a/builtin/read-tree.c
+++ b/builtin/read-tree.c
@@ -15,10 +15,13 @@
 #include "builtin.h"
 #include "parse-options.h"
 #include "resolve-undo.h"
+#include "submodule.h"
+#include "submodule-config.h"
 
 static int nr_trees;
 static int read_empty;
 static struct tree *trees[MAX_UNPACK_TREES];
+int recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
 
 static int list_tree(unsigned char *sha1)
 {
@@ -96,6 +99,23 @@ static int debug_merge(const struct cache_entry * const 
*stages,
return 0;
 }
 
+static int option_parse_recurse_submodules(const struct option *opt,
+  const char *arg, int unset)
+{
+   if (unset) {
+   recurse_submodules = RECURSE_SUBMODULES_OFF;
+   return 0;
+   }
+   if (arg)
+   recurse_submodules =
+   parse_update_recurse_submodules_arg(opt->long_name,
+   arg);
+   else
+   recurse_submodules = RECURSE_SUBMODULES_ON;
+
+   return 0;
+}
+
 static struct lock_file lock_file;
 
 int cmd_read_tree(int argc, const char **argv, const char *unused_prefix)
@@ -137,6 +157,9 @@ int cmd_read_tree(int argc, const char **argv, const char 
*unused_prefix)
 N_("skip applying sparse checkout filter")),
OPT_BOOL(0, "debug-unpack", _unpack,
 N_("debug unpack-trees")),
+   { OPTION_CALLBACK, 0, "recurse-submodules", _submodules,
+   "checkout", "control recursive updating of 
submodules",
+   PARSE_OPT_OPTARG, option_parse_recurse_submodules },
OPT_END()
};
 
@@ -152,6 +175,12 @@ int cmd_read_tree(int argc, const char **argv, const char 
*unused_prefix)
 
hold_locked_index(_file, LOCK_DIE_ON_ERROR);
 
+   if (recurse_submodules != RECURSE_SUBMODULES_DEFAULT) {
+   gitmodules_config();
+   git_config(submodule_config, NULL);
+   set_config_update_recurse_submodules(RECURSE_SUBMODULES_ON);
+   }
+
prefix_set = opts.prefix ? 1 : 0;
if (1 < opts.merge + opts.reset + prefix_set)
die("Which one? -m, --reset, or --prefix?");
diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index 6a78139f90..949ebd546c 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -787,6 +787,11 @@ test_submodule_switch_recursing () {
then
RESULT=failure
fi
+   RESULT1=success
+   if test "$KNOWN_FAILURE_SUBMODULE_OVERWRITE_IGNORED_UNTRACKED" = 1
+   then
+   RESULT1=failure
+   fi
# Appearing submodule #
# Switching to a commit letting a submodule appear checks it out ...
test_expect_success "$command: added submodule is checked out" '
@@ -827,7 +832,7 @@ test_submodule_switch_recursing () {
)
'
# ... but an ignored file is fine.
-   test_expect_success "$command: added submodule removes an untracked 
ignored file" '
+   test_expect_$RESULT1 "$command: added submodule removes an untracked 
ignored file" '
test_when_finished "rm submodule_update/.git/info/exclude" &&
prolog &&
reset_work_tree_to_interested no_submodule &&
diff --git a/t/t1013-read-tree-submodule.sh

[PATCH 05/18] lib-submodule-update.sh: define tests for recursing into submodules

Currently lib-submodule-update.sh provides 2 functions
test_submodule_switch and test_submodule_forced_switch that are used by a
variety of tests to ensure that submodules behave as expected. The current
expected behavior is that submodules are not touched at all (see
42639d2317a for the exact setup).

In the future we want to teach all these commands to recurse
into submodules. To do that, we'll add two testing functions to
submodule-update-lib.sh: test_submodule_switch_recursing and
test_submodule_forced_switch_recursing.

These two functions behave in analogy to the already existing functions
just with a different expectation on submodule behavior. The submodule
in the working tree is expected to be updated to the recorded submodule
version. The behavior is analogous to e.g. the behavior of files in a
nested directory in the working tree, where a change to the working tree
handles any arising directory/file conflicts just fine.

Signed-off-by: Stefan Beller 
---
 t/lib-submodule-update.sh | 506 +-
 1 file changed, 504 insertions(+), 2 deletions(-)

diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index f52c49c838..5b885cfbf8 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -4,6 +4,7 @@
 # - New submodule (no_submodule => add_sub1)
 # - Removed submodule (add_sub1 => remove_sub1)
 # - Updated submodule (add_sub1 => modify_sub1)
+# - Updated submodule recursively (add_nested_sub => modify_sub1_recursively)
 # - Submodule updated to invalid commit (add_sub1 => invalid_sub1)
 # - Submodule updated from invalid commit (invalid_sub1 => valid_sub1)
 # - Submodule replaced by tracked files in directory (add_sub1 =>
@@ -19,8 +20,8 @@
 #/^
 #   / remove_sub1
 #  /
-#   add_sub1  /---O
-# |  /^
+#   add_sub1  /---O-OO  modify_sub1_recursively
+# |  /^ add_nested_sub
 # | / modify_sub1
 # v/
 #  O--O---O-O
@@ -48,6 +49,17 @@ create_lib_submodule_repo () {
git commit -m "Base inside first submodule" &&
git branch "no_submodule"
) &&
+   git init submodule_update_sub2 &&
+   (
+   cd submodule_update_sub2
+   echo "expect" >>.gitignore &&
+   echo "actual" >>.gitignore &&
+   echo "x" >file1 &&
+   echo "y" >file2 &&
+   git add .gitignore file1 file2 &&
+   git commit -m "nested submodule base" &&
+   git branch "no_submodule"
+   ) &&
git init submodule_update_repo &&
(
cd submodule_update_repo &&
@@ -84,6 +96,26 @@ create_lib_submodule_repo () {
git add sub1 &&
git commit -m "Modify sub1" &&
 
+   git checkout -b add_nested_sub modify_sub1 &&
+   git -C sub1 checkout -b "add_nested_sub" &&
+   git -C sub1 submodule add --branch no_submodule 
../submodule_update_sub2 sub2 &&
+   git -C sub1 commit -a -m "add a nested submodule" &&
+   git add sub1 &&
+   git commit -a -m "update submodule, that updates a nested 
submodule" &&
+   git checkout -b modify_sub1_recursively &&
+   git -C sub1 checkout -b modify_sub1_recursively &&
+   git -C sub1/sub2 checkout -b modify_sub1_recursively &&
+   echo change >sub1/sub2/file3 &&
+   git -C sub1/sub2 add file3 &&
+   git -C sub1/sub2 commit -m "make a change in nested sub" &&
+   git -C sub1 add sub2 &&
+   git -C sub1 commit -m "update nested sub" &&
+   git add sub1 &&
+   git commit -m "update sub1, that updates nested sub" &&
+   git -C sub1 push origin modify_sub1_recursively &&
+   git -C sub1/sub2 push origin modify_sub1_recursively &&
+   git -C sub1 submodule deinit -f --all &&
+
git checkout -b replace_sub1_with_directory add_sub1 &&
git submodule update &&
git -C sub1 checkout modifications &&
@@ -150,6 +182,15 @@ test_git_directory_is_unchanged () {
)
 }
 
+test_git_directory_exists() {
+   test -e ".git/modules/$1" &&
+   if test -f sub1/.git
+   then
+   # does core.worktree point at the right place?
+   test "$(git -C .git/modules/$1 config core.worktree)" = 
"../../../$1"
+   fi
+}
+
 # Helper function to be executed at the start of every test below, it sets up
 # the submodule repo if it doesn't exist and configures the most problematic
 # settings for diff.ignoreSubmodules.
@@ -180,6 +221,27 @@ reset_work_tree_to () {
)
 }
 
+reset_work_tree_to_interested () {
+   reset_work_tree_to $1 &&
+   # make the submodule git

[PATCH 14/18] unpack-trees: check if we can perform the operation for submodules

Signed-off-by: Stefan Beller 
---
 unpack-trees.c | 131 +
 unpack-trees.h |   1 +
 2 files changed, 123 insertions(+), 9 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 616a0ae4b2..8333da2cc9 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -10,6 +10,8 @@
 #include "attr.h"
 #include "split-index.h"
 #include "dir.h"
+#include "submodule.h"
+#include "submodule-config.h"
 
 /*
  * Error messages expected by scripts out of plumbing commands such as
@@ -45,6 +47,9 @@ static const char 
*unpack_plumbing_errors[NB_UNPACK_TREES_ERROR_TYPES] = {
 
/* ERROR_WOULD_LOSE_ORPHANED_REMOVED */
"Working tree file '%s' would be removed by sparse checkout update.",
+
+   /* ERROR_WOULD_LOSE_SUBMODULE */
+   "Submodule '%s' cannot checkout new HEAD.",
 };
 
 #define ERRORMSG(o,type) \
@@ -161,6 +166,8 @@ void setup_unpack_trees_porcelain(struct 
unpack_trees_options *opts,
_("The following working tree files would be overwritten by 
sparse checkout update:\n%s");
msgs[ERROR_WOULD_LOSE_ORPHANED_REMOVED] =
_("The following working tree files would be removed by sparse 
checkout update:\n%s");
+   msgs[ERROR_WOULD_LOSE_SUBMODULE] =
+   _("Submodule '%s' cannot checkout new HEAD");
 
opts->show_all_errors = 1;
/* rejected paths may not have a static buffer */
@@ -240,12 +247,75 @@ static void display_error_msgs(struct 
unpack_trees_options *o)
fprintf(stderr, _("Aborting\n"));
 }
 
+static int check_submodule_move_head(const struct cache_entry *ce,
+const char *old_id,
+const char *new_id,
+struct unpack_trees_options *o)
+{
+   const struct submodule *sub = submodule_from_ce(ce);
+   if (!sub)
+   return 0;
+
+   switch (sub->update_strategy.type) {
+   case SM_UPDATE_UNSPECIFIED:
+   case SM_UPDATE_CHECKOUT:
+   if (submodule_move_head(ce->name, old_id, new_id, 
SUBMODULE_MOVE_HEAD_DRY_RUN))
+   return o->gently ? -1 :
+   add_rejected_path(o, 
ERROR_WOULD_LOSE_SUBMODULE, ce->name);
+   return 0;
+   case SM_UPDATE_NONE:
+   return 0;
+   case SM_UPDATE_REBASE:
+   case SM_UPDATE_MERGE:
+   case SM_UPDATE_COMMAND:
+   default:
+   warning(_("submodule update strategy not supported for 
submodule '%s'"), ce->name);
+   return -1;
+   }
+}
+
+static void reload_gitmodules_file(struct index_state *index,
+  struct checkout *state)
+{
+   int i;
+   for (i = 0; i < index->cache_nr; i++) {
+   struct cache_entry *ce = index->cache[i];
+   if (ce->ce_flags & CE_UPDATE) {
+   int r = strcmp(ce->name, ".gitmodules");
+   if (r < 0)
+   continue;
+   else if (r == 0) {
+   submodule_free();
+   checkout_entry(ce, state, NULL);
+   gitmodules_config();
+   git_config(submodule_config, NULL);
+   } else
+   break;
+   }
+   }
+}
+
 /*
  * Unlink the last component and schedule the leading directories for
  * removal, such that empty directories get removed.
  */
 static void unlink_entry(const struct cache_entry *ce)
 {
+   const struct submodule *sub = submodule_from_ce(ce);
+   if (sub) {
+   switch (sub->update_strategy.type) {
+   case SM_UPDATE_UNSPECIFIED:
+   case SM_UPDATE_CHECKOUT:
+   case SM_UPDATE_REBASE:
+   case SM_UPDATE_MERGE:
+   submodule_move_head(ce->name, "HEAD", NULL,
+   SUBMODULE_MOVE_HEAD_FORCE);
+   break;
+   case SM_UPDATE_NONE:
+   case SM_UPDATE_COMMAND:
+   return; /* Do not touch the submodule. */
+   }
+   }
if (!check_leading_path(ce->name, ce_namelen(ce)))
return;
if (remove_or_warn(ce->ce_mode, ce->name))
@@ -301,6 +371,9 @@ static int check_updates(struct unpack_trees_options *o)
remove_marked_cache_entries(index);
remove_scheduled_dirs();
 
+   if (should_update_submodules() && o->update && !o->dry_run)
+   reload_gitmodules_file(index, );
+
for (i = 0; i < index->cache_nr; i++) {
struct cache_entry *ce = index->cache[i];
 
@@ -1358,17 +1431,26 @@ static int verify_uptodate_1(const struct cache_entry 
*ce,
if (!lstat(ce->name, )) {
int flags =

[PATCH 04/18] lib-submodule-update.sh: replace sha1 by hash

Cleaning up code by generalising it.
Currently the mailing list discusses yet again how
to migrate away from sha1.

Signed-off-by: Stefan Beller 
---
 t/lib-submodule-update.sh | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index 00128f28b5..f52c49c838 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -171,9 +171,9 @@ reset_work_tree_to () {
git checkout -f "$1" &&
git status -u -s >actual &&
test_must_be_empty actual &&
-   sha1=$(git rev-parse --revs-only HEAD:sub1) &&
-   if test -n "$sha1" &&
-  test $(cd "../submodule_update_sub1" && git rev-parse 
--verify "$sha1^{commit}")
+   hash=$(git rev-parse --revs-only HEAD:sub1) &&
+   if test -n "$hash" &&
+  test $(cd "../submodule_update_sub1" && git rev-parse 
--verify "$hash^{commit}")
then
git submodule update --init --recursive "sub1"
fi
-- 
2.12.0.rc1.52.ge239d7e709.dirty

[PATCH 11/18] update submodules: move up prepare_submodule_repo_env

In a later patch we need to prepare the submodule environment with
another git directory, so split up the function.

Also move it up in the file such that we do not need to declare the
function later before using it.

Signed-off-by: Stefan Beller 
---
 submodule.c | 29 +
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/submodule.c b/submodule.c
index 8b2c0212be..0b2596e88a 100644
--- a/submodule.c
+++ b/submodule.c
@@ -356,6 +356,23 @@ static void print_submodule_summary(struct rev_info *rev, 
FILE *f,
strbuf_release();
 }
 
+static void prepare_submodule_repo_env_no_git_dir(struct argv_array *out)
+{
+   const char * const *var;
+
+   for (var = local_repo_env; *var; var++) {
+   if (strcmp(*var, CONFIG_DATA_ENVIRONMENT))
+   argv_array_push(out, *var);
+   }
+}
+
+void prepare_submodule_repo_env(struct argv_array *out)
+{
+   prepare_submodule_repo_env_no_git_dir(out);
+   argv_array_pushf(out, "%s=%s", GIT_DIR_ENVIRONMENT,
+DEFAULT_GIT_DIR_ENVIRONMENT);
+}
+
 /* Helper function to display the submodule header line prior to the full
  * summary output. If it can locate the submodule objects directory it will
  * attempt to lookup both the left and right commits and put them into the
@@ -1390,18 +1407,6 @@ int parallel_submodules(void)
return parallel_jobs;
 }
 
-void prepare_submodule_repo_env(struct argv_array *out)
-{
-   const char * const *var;
-
-   for (var = local_repo_env; *var; var++) {
-   if (strcmp(*var, CONFIG_DATA_ENVIRONMENT))
-   argv_array_push(out, *var);
-   }
-   argv_array_pushf(out, "%s=%s", GIT_DIR_ENVIRONMENT,
-DEFAULT_GIT_DIR_ENVIRONMENT);
-}
-
 /*
  * Embeds a single submodules git directory into the superprojects git dir,
  * non recursively.
-- 
2.12.0.rc1.52.ge239d7e709.dirty

[PATCH 16/18] entry.c: update submodules when interesting

Signed-off-by: Stefan Beller 
---
 entry.c | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/entry.c b/entry.c
index c6eea240b6..d2b512da90 100644
--- a/entry.c
+++ b/entry.c
@@ -2,6 +2,7 @@
 #include "blob.h"
 #include "dir.h"
 #include "streaming.h"
+#include "submodule.h"
 
 static void create_directories(const char *path, int path_len,
   const struct checkout *state)
@@ -146,6 +147,7 @@ static int write_entry(struct cache_entry *ce,
unsigned long size;
size_t wrote, newsize = 0;
struct stat st;
+   const struct submodule *sub;
 
if (ce_mode_s_ifmt == S_IFREG) {
struct stream_filter *filter = get_stream_filter(ce->name,
@@ -203,6 +205,10 @@ static int write_entry(struct cache_entry *ce,
return error("cannot create temporary submodule %s", 
path);
if (mkdir(path, 0777) < 0)
return error("cannot create submodule directory %s", 
path);
+   sub = submodule_from_ce(ce);
+   if (sub)
+   return submodule_move_head(ce->name,
+   NULL, oid_to_hex(>oid), 
SUBMODULE_MOVE_HEAD_FORCE);
break;
default:
return error("unknown file mode for %s in index", path);
@@ -259,7 +265,31 @@ int checkout_entry(struct cache_entry *ce,
strbuf_add(, ce->name, ce_namelen(ce));
 
if (!check_path(path.buf, path.len, , state->base_dir_len)) {
+   const struct submodule *sub;
unsigned changed = ce_match_stat(ce, , 
CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE);
+   /*
+* Needs to be checked before !changed returns early,
+* as the possibly empty directory was not changed
+*/
+   sub = submodule_from_ce(ce);
+   if (sub) {
+   int err;
+   if (!is_submodule_populated_gently(ce->name, )) {
+   struct stat sb;
+   if (lstat(ce->name, ))
+   die(_("could not stat file '%s'"), 
ce->name);
+   if (!(st.st_mode & S_IFDIR))
+   unlink_or_warn(ce->name);
+
+   return submodule_move_head(ce->name,
+   NULL, oid_to_hex(>oid),
+   SUBMODULE_MOVE_HEAD_FORCE);
+   } else
+   return submodule_move_head(ce->name,
+   "HEAD", oid_to_hex(>oid),
+   SUBMODULE_MOVE_HEAD_FORCE);
+   }
+
if (!changed)
return 0;
if (!state->force) {
-- 
2.12.0.rc1.52.ge239d7e709.dirty

[PATCH 12/18] update submodules: add submodule_move_head

In later patches we introduce the options and flag for commands
that modify the working directory, e.g. git-checkout.

This piece of code will be used universally for
all these working tree modifications as it
* supports dry run to answer the question:
  "Is it safe to change the submodule to this new state?"
  e.g. is it overwriting untracked files or are there local
  changes that would be overwritten?
* supports a force flag that can be used for resetting
  the tree.

Signed-off-by: Stefan Beller 
---
 submodule.c | 135 
 submodule.h |   7 
 2 files changed, 142 insertions(+)

diff --git a/submodule.c b/submodule.c
index 0b2596e88a..bc5fecf8c5 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1239,6 +1239,141 @@ int bad_to_remove_submodule(const char *path, unsigned 
flags)
return ret;
 }
 
+static int submodule_has_dirty_index(const struct submodule *sub)
+{
+   struct child_process cp = CHILD_PROCESS_INIT;
+
+   prepare_submodule_repo_env_no_git_dir(_array);
+
+   cp.git_cmd = 1;
+   argv_array_pushl(, "diff-index", "--quiet", \
+   "--cached", "HEAD", NULL);
+   cp.no_stdin = 1;
+   cp.no_stdout = 1;
+   cp.dir = sub->path;
+   if (start_command())
+   die("could not recurse into submodule '%s'", sub->path);
+
+   return finish_command();
+}
+
+static void submodule_reset_index(const char *path)
+{
+   struct child_process cp = CHILD_PROCESS_INIT;
+   prepare_submodule_repo_env_no_git_dir(_array);
+
+   cp.git_cmd = 1;
+   cp.no_stdin = 1;
+   cp.dir = path;
+
+   argv_array_pushf(, "--super-prefix=%s/", path);
+   argv_array_pushl(, "read-tree", "-u", "--reset", NULL);
+
+   argv_array_push(, EMPTY_TREE_SHA1_HEX);
+
+   if (run_command())
+   die("could not reset submodule index");
+}
+
+/**
+ * Moves a submodule at a given path from a given head to another new head.
+ * For edge cases (a submodule coming into existence or removing a submodule)
+ * pass NULL for old or new respectively.
+ */
+int submodule_move_head(const char *path,
+const char *old,
+const char *new,
+unsigned flags)
+{
+   int ret = 0;
+   struct child_process cp = CHILD_PROCESS_INIT;
+   const struct submodule *sub;
+
+   sub = submodule_from_path(null_sha1, path);
+
+   if (!sub)
+   die("BUG: could not get submodule information for '%s'", path);
+
+   if (old && !(flags & SUBMODULE_MOVE_HEAD_FORCE)) {
+   /* Check if the submodule has a dirty index. */
+   if (submodule_has_dirty_index(sub))
+   return error(_("submodule '%s' has dirty index"), path);
+   }
+
+   if (!(flags & SUBMODULE_MOVE_HEAD_DRY_RUN)) {
+   if (old) {
+   if (!submodule_uses_gitfile(path))
+   absorb_git_dir_into_superproject("", path,
+   ABSORB_GITDIR_RECURSE_SUBMODULES);
+   } else {
+   struct strbuf sb = STRBUF_INIT;
+   strbuf_addf(, "%s/modules/%s",
+   get_git_common_dir(), sub->name);
+   connect_work_tree_and_git_dir(path, sb.buf);
+   strbuf_release();
+
+   /* make sure the index is clean as well */
+   submodule_reset_index(path);
+   }
+   }
+
+   prepare_submodule_repo_env_no_git_dir(_array);
+
+   cp.git_cmd = 1;
+   cp.no_stdin = 1;
+   cp.dir = path;
+
+   argv_array_pushf(, "--super-prefix=%s/", path);
+   argv_array_pushl(, "read-tree", NULL);
+
+   if (flags & SUBMODULE_MOVE_HEAD_DRY_RUN)
+   argv_array_push(, "-n");
+   else
+   argv_array_push(, "-u");
+
+   if (flags & SUBMODULE_MOVE_HEAD_FORCE)
+   argv_array_push(, "--reset");
+   else
+   argv_array_push(, "-m");
+
+   argv_array_push(, old ? old : EMPTY_TREE_SHA1_HEX);
+   argv_array_push(, new ? new : EMPTY_TREE_SHA1_HEX);
+
+   if (run_command()) {
+   ret = -1;
+   goto out;
+   }
+
+   if (!(flags & SUBMODULE_MOVE_HEAD_DRY_RUN)) {
+   if (new) {
+   struct child_process cp1 = CHILD_PROCESS_INIT;
+   /* also set the HEAD accordingly */
+   cp1.git_cmd = 1;
+   cp1.no_stdin = 1;
+   cp1.dir = path;
+
+   argv_array_pushl(, "update-ref", "HEAD",
+new ? new : EMPTY_TREE_SHA1_HEX, NULL);
+
+   if (run_command()) {
+   ret = -1;
+   goto out;
+

[PATCH 15/18] read-cache, remove_marked_cache_entries: wipe selected submodules.

Signed-off-by: Stefan Beller 
---
 read-cache.c | 27 +--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 9054369dd0..9a2abacf7a 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -18,6 +18,8 @@
 #include "varint.h"
 #include "split-index.h"
 #include "utf8.h"
+#include "submodule.h"
+#include "submodule-config.h"
 
 /* Mask for the name length in ce_flags in the on-disk index */
 
@@ -520,6 +522,22 @@ int remove_index_entry_at(struct index_state *istate, int 
pos)
return 1;
 }
 
+static void remove_submodule_according_to_strategy(const struct submodule *sub)
+{
+   switch (sub->update_strategy.type) {
+   case SM_UPDATE_UNSPECIFIED:
+   case SM_UPDATE_CHECKOUT:
+   case SM_UPDATE_REBASE:
+   case SM_UPDATE_MERGE:
+   submodule_move_head(sub->path, "HEAD", NULL, \
+   SUBMODULE_MOVE_HEAD_FORCE);
+   break;
+   case SM_UPDATE_NONE:
+   case SM_UPDATE_COMMAND:
+   ; /* Do not touch the submodule. */
+   }
+}
+
 /*
  * Remove all cache entries marked for removal, that is where
  * CE_REMOVE is set in ce_flags.  This is much more effective than
@@ -532,8 +550,13 @@ void remove_marked_cache_entries(struct index_state 
*istate)
 
for (i = j = 0; i < istate->cache_nr; i++) {
if (ce_array[i]->ce_flags & CE_REMOVE) {
-   remove_name_hash(istate, ce_array[i]);
-   save_or_free_index_entry(istate, ce_array[i]);
+   const struct submodule *sub = 
submodule_from_ce(ce_array[i]);
+   if (sub) {
+   remove_submodule_according_to_strategy(sub);
+   } else {
+   remove_name_hash(istate, ce_array[i]);
+   save_or_free_index_entry(istate, ce_array[i]);
+   }
}
else
ce_array[j++] = ce_array[i];
-- 
2.12.0.rc1.52.ge239d7e709.dirty

[PATCH 02/18] lib-submodule-update.sh: do not use ./. as submodule remote

Adding the repository itself as a submodule does not make sense in the
real world. In our test suite we used to do that out of convenience in
some tests as the current repository has easiest access for setting up
'just a submodule'.

However this doesn't quite test the real world, so let's do not follow
this pattern any further and actually create an independent repository
that we can use as a submodule.

When using './.' as the remote the superproject and submodule share the
same objects, such that testing if a given sha1 is a valid commit works
in either repository.  As running commands in an unpopulated submodule
fall back to the superproject, this happens in `reset_work_tree_to`
to determine if we need to populate the submodule. Fix this bug by
checking in the actual remote now.

Signed-off-by: Stefan Beller 
---
 t/lib-submodule-update.sh | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index 5df528ea81..c0d6325133 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -37,6 +37,17 @@
 #
 
 create_lib_submodule_repo () {
+   git init submodule_update_sub1 &&
+   (
+   cd submodule_update_sub1 &&
+   echo "expect" >>.gitignore &&
+   echo "actual" >>.gitignore &&
+   echo "x" >file1 &&
+   echo "y" >file2 &&
+   git add .gitignore file1 file2 &&
+   git commit -m "Base inside first submodule" &&
+   git branch "no_submodule"
+   ) &&
git init submodule_update_repo &&
(
cd submodule_update_repo &&
@@ -49,7 +60,7 @@ create_lib_submodule_repo () {
git branch "no_submodule" &&
 
git checkout -b "add_sub1" &&
-   git submodule add ./. sub1 &&
+   git submodule add ../submodule_update_sub1 sub1 &&
git config -f .gitmodules submodule.sub1.ignore all &&
git config submodule.sub1.ignore all &&
git add .gitmodules &&
@@ -162,7 +173,7 @@ reset_work_tree_to () {
test_must_be_empty actual &&
sha1=$(git rev-parse --revs-only HEAD:sub1) &&
if test -n "$sha1" &&
-  test $(cd "sub1" && git rev-parse --verify "$sha1^{commit}")
+  test $(cd "../submodule_update_sub1" && git rev-parse 
--verify "$sha1^{commit}")
then
git submodule update --init --recursive "sub1"
fi
-- 
2.12.0.rc1.52.ge239d7e709.dirty

[PATCH 13/18] unpack-trees: pass old oid to verify_clean_submodule

The check (which uses the old oid) is yet to be implemented, but this part
is just a refactor, so it can go separately first.

Signed-off-by: Stefan Beller 
---
 unpack-trees.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/unpack-trees.c b/unpack-trees.c
index 3a8ee19fe8..616a0ae4b2 100644
--- a/unpack-trees.c
+++ b/unpack-trees.c
@@ -1407,7 +1407,8 @@ static void invalidate_ce_path(const struct cache_entry 
*ce,
  * Currently, git does not checkout subprojects during a superproject
  * checkout, so it is not going to overwrite anything.
  */
-static int verify_clean_submodule(const struct cache_entry *ce,
+static int verify_clean_submodule(const char *old_sha1,
+ const struct cache_entry *ce,
  enum unpack_trees_error_types error_type,
  struct unpack_trees_options *o)
 {
@@ -1427,16 +1428,18 @@ static int verify_clean_subdirectory(const struct 
cache_entry *ce,
struct dir_struct d;
char *pathbuf;
int cnt = 0;
-   unsigned char sha1[20];
 
-   if (S_ISGITLINK(ce->ce_mode) &&
-   resolve_gitlink_ref(ce->name, "HEAD", sha1) == 0) {
-   /* If we are not going to update the submodule, then
+   if (S_ISGITLINK(ce->ce_mode)) {
+   unsigned char sha1[20];
+   int sub_head = resolve_gitlink_ref(ce->name, "HEAD", sha1);
+   /*
+* If we are not going to update the submodule, then
 * we don't care.
 */
-   if (!hashcmp(sha1, ce->oid.hash))
+   if (!sub_head && !hashcmp(sha1, ce->oid.hash))
return 0;
-   return verify_clean_submodule(ce, error_type, o);
+   return verify_clean_submodule(sub_head ? NULL : 
sha1_to_hex(sha1),
+ ce, error_type, o);
}
 
/*
-- 
2.12.0.rc1.52.ge239d7e709.dirty

[PATCH 17/18] builtin/checkout: add --recurse-submodules switch

This exposes a flag to recurse into submodules
in builtin/checkout making use of the code implemented
in prior patches.

A new failure mode is introduced in the submodule
update library, as the directory/submodule conflict
is not solved in prior patches.

Signed-off-by: Stefan Beller 
---
 Documentation/git-checkout.txt |  7 +++
 builtin/checkout.c | 28 
 t/lib-submodule-update.sh  | 23 +--
 t/t2013-checkout-submodule.sh  |  5 +
 4 files changed, 57 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-checkout.txt b/Documentation/git-checkout.txt
index 8e2c0662dd..d6399c0af8 100644
--- a/Documentation/git-checkout.txt
+++ b/Documentation/git-checkout.txt
@@ -256,6 +256,13 @@ section of linkgit:git-add[1] to learn how to operate the 
`--patch` mode.
out anyway. In other words, the ref can be held by more than one
worktree.
 
+--[no-]recurse-submodules::
+   Using --recurse-submodules will update the content of all initialized
+   submodules according to the commit recorded in the superproject. If
+   local modifications in a submodule would be overwritten the checkout
+   will fail unless `-f` is used. If nothing (or --no-recurse-submodules)
+   is used, the work trees of submodules will not be updated.
+
 ::
Branch to checkout; if it refers to a branch (i.e., a name that,
when prepended with "refs/heads/", is a valid ref), then that
diff --git a/builtin/checkout.c b/builtin/checkout.c
index f174f50303..e9c5fcfaf8 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -21,12 +21,31 @@
 #include "submodule-config.h"
 #include "submodule.h"
 
+static int recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
+
 static const char * const checkout_usage[] = {
N_("git checkout [] "),
N_("git checkout [] [] -- ..."),
NULL,
 };
 
+static int option_parse_recurse_submodules(const struct option *opt,
+  const char *arg, int unset)
+{
+   if (unset) {
+   recurse_submodules = RECURSE_SUBMODULES_OFF;
+   return 0;
+   }
+   if (arg)
+   recurse_submodules =
+   parse_update_recurse_submodules_arg(opt->long_name,
+   arg);
+   else
+   recurse_submodules = RECURSE_SUBMODULES_ON;
+
+   return 0;
+}
+
 struct checkout_opts {
int patch_mode;
int quiet;
@@ -1163,6 +1182,9 @@ int cmd_checkout(int argc, const char **argv, const char 
*prefix)
N_("second guess 'git checkout 
'")),
OPT_BOOL(0, "ignore-other-worktrees", 
_other_worktrees,
 N_("do not check if another worktree is holding the 
given ref")),
+   { OPTION_CALLBACK, 0, "recurse-submodules", _submodules,
+   "checkout", "control recursive updating of 
submodules",
+   PARSE_OPT_OPTARG, option_parse_recurse_submodules },
OPT_BOOL(0, "progress", _progress, N_("force progress 
reporting")),
OPT_END(),
};
@@ -1193,6 +1215,12 @@ int cmd_checkout(int argc, const char **argv, const char 
*prefix)
git_xmerge_config("merge.conflictstyle", conflict_style, NULL);
}
 
+   if (recurse_submodules != RECURSE_SUBMODULES_OFF) {
+   git_config(submodule_config, NULL);
+   if (recurse_submodules != RECURSE_SUBMODULES_DEFAULT)
+   
set_config_update_recurse_submodules(recurse_submodules);
+   }
+
if ((!!opts.new_branch + !!opts.new_branch_force + 
!!opts.new_orphan_branch) > 1)
die(_("-b, -B and --orphan are mutually exclusive"));
 
diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index 5b885cfbf8..6a78139f90 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -782,6 +782,11 @@ test_submodule_forced_switch () {
 
 test_submodule_switch_recursing () {
command="$1"
+   RESULT=success
+   if test "$KNOWN_FAILURE_DIRECTORY_SUBMODULE_CONFLICTS" = 1
+   then
+   RESULT=failure
+   fi
# Appearing submodule #
# Switching to a commit letting a submodule appear checks it out ...
test_expect_success "$command: added submodule is checked out" '
@@ -891,7 +896,7 @@ test_submodule_switch_recursing () {
'
# Replacing a submodule with files in a directory must succeeds
# when the submodule is clean
-   test_expect_success "$command: replace submodule with a directory" '
+   test_expect_$RESULT "$command: replace submodule with a directory" '
prolog &&
reset_work_tree_to_interested add_sub1 &&
(
@@ -903,7 +908,7 @@

[PATCH 01/18] lib-submodule-update.sh: reorder create_lib_submodule_repo

Redraw the ASCII art describing the setup using more space, such that
it is easier to understand.  The leaf commits are now ordered the same
way the actual code is ordered.

Add empty lines to the setup code separating each of the leaf commits,
each starting with a "checkout -b".

Signed-off-by: Stefan Beller 
---
 t/lib-submodule-update.sh | 49 ---
 1 file changed, 29 insertions(+), 20 deletions(-)

diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index 915eb4a7c6..5df528ea81 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -15,22 +15,27 @@
 # - Tracked file replaced by submodule (replace_sub1_with_file =>
 #   replace_file_with_sub1)
 #
-#   --O-O
-#  /  ^ replace_directory_with_sub1
-# /   replace_sub1_with_directory
-#/O
-#   / ^
-#  /  modify_sub1
-#  O--O---O
-#  ^  ^\  ^
-#  |  | \ remove_sub1
-#  |  |  -O-O
-#  |  |   \   ^ replace_file_with_sub1
-#  |  |\  replace_sub1_with_file
-#  |   add_sub1 --O-O
-# no_submodule^ valid_sub1
-# invalid_sub1
+# O
+#/^
+#   / remove_sub1
+#  /
+#   add_sub1  /---O
+# |  /^
+# | / modify_sub1
+# v/
+#  O--O---O-O
+#  ^   \  ^ replace_directory_with_sub1
+#  |\ replace_sub1_with_directory
+# no_submodule   \
+# O-O
+#  \  ^ replace_file_with_sub1
+#   \ replace_sub1_with_file
+#\
+# O-O
+# ^ valid_sub1
+# invalid_sub1
 #
+
 create_lib_submodule_repo () {
git init submodule_update_repo &&
(
@@ -49,10 +54,11 @@ create_lib_submodule_repo () {
git config submodule.sub1.ignore all &&
git add .gitmodules &&
git commit -m "Add sub1" &&
-   git checkout -b remove_sub1 &&
+
+   git checkout -b remove_sub1 add_sub1 &&
git revert HEAD &&
 
-   git checkout -b "modify_sub1" "add_sub1" &&
+   git checkout -b modify_sub1 add_sub1 &&
git submodule update &&
(
cd sub1 &&
@@ -67,7 +73,7 @@ create_lib_submodule_repo () {
git add sub1 &&
git commit -m "Modify sub1" &&
 
-   git checkout -b "replace_sub1_with_directory" "add_sub1" &&
+   git checkout -b replace_sub1_with_directory add_sub1 &&
git submodule update &&
git -C sub1 checkout modifications &&
git rm --cached sub1 &&
@@ -75,22 +81,25 @@ create_lib_submodule_repo () {
git config -f .gitmodules --remove-section "submodule.sub1" &&
git add .gitmodules sub1/* &&
git commit -m "Replace sub1 with directory" &&
+
git checkout -b replace_directory_with_sub1 &&
git revert HEAD &&
 
-   git checkout -b "replace_sub1_with_file" "add_sub1" &&
+   git checkout -b replace_sub1_with_file add_sub1 &&
git rm sub1 &&
echo "content" >sub1 &&
git add sub1 &&
git commit -m "Replace sub1 with file" &&
+
git checkout -b replace_file_with_sub1 &&
git revert HEAD &&
 
-   git checkout -b "invalid_sub1" "add_sub1" &&
+   git checkout -b invalid_sub1 add_sub1 &&
git update-index --cacheinfo 16 
0123456789012345678901234567890123456789 sub1 &&
git commit -m "Invalid sub1 commit" &&
git checkout -b valid_sub1 &&
git revert HEAD &&
+
git checkout master
)
 }
-- 
2.12.0.rc1.52.ge239d7e709.dirty

[PATCH 07/18] connect_work_tree_and_git_dir: safely create leading directories

In a later patch we'll use connect_work_tree_and_git_dir when the
directory for the gitlink file doesn't exist yet. This patch makes
connect_work_tree_and_git_dir safe to use for both cases of
either the git dir or the working dir missing.

To do so, we need to call safe_create_leading_directories[_const]
on both directories. However this has to happen before we construct
the absolute paths as real_pathdup assumes the directories to
be there already.

So for both the config file in the git dir as well as the .git link
file we need to
a) construct the name
b) call SCLD
c) get the absolute path
d) once a-c is done for both we can consume the absolute path
   to compute the relative path to each other and store those
   relative paths.

The implementation provided here puts a) and b) for both cases first,
and then performs c and d after.

One of the two users of 'connect_work_tree_and_git_dir' already checked
for the directory being there, so we can loose that check as
connect_work_tree_and_git_dir handles this functionality now.

Signed-off-by: Stefan Beller 
---
 dir.c   | 32 +---
 submodule.c | 11 ++-
 2 files changed, 23 insertions(+), 20 deletions(-)

diff --git a/dir.c b/dir.c
index 4541f9e146..6f52af7abb 100644
--- a/dir.c
+++ b/dir.c
@@ -2728,23 +2728,33 @@ void untracked_cache_add_to_index(struct index_state 
*istate,
 /* Update gitfile and core.worktree setting to connect work tree and git dir */
 void connect_work_tree_and_git_dir(const char *work_tree_, const char 
*git_dir_)
 {
-   struct strbuf file_name = STRBUF_INIT;
+   struct strbuf gitfile_sb = STRBUF_INIT;
+   struct strbuf cfg_sb = STRBUF_INIT;
struct strbuf rel_path = STRBUF_INIT;
-   char *git_dir = real_pathdup(git_dir_);
-   char *work_tree = real_pathdup(work_tree_);
+   char *git_dir, *work_tree;
 
-   /* Update gitfile */
-   strbuf_addf(_name, "%s/.git", work_tree);
-   write_file(file_name.buf, "gitdir: %s",
-  relative_path(git_dir, work_tree, _path));
+   /* Prepare .git file */
+   strbuf_addf(_sb, "%s/.git", work_tree_);
+   if (safe_create_leading_directories_const(gitfile_sb.buf))
+   die(_("could not create directories for %s"), gitfile_sb.buf);
+
+   /* Prepare config file */
+   strbuf_addf(_sb, "%s/config", git_dir_);
+   if (safe_create_leading_directories_const(cfg_sb.buf))
+   die(_("could not create directories for %s"), cfg_sb.buf);
 
+   git_dir = real_pathdup(git_dir_);
+   work_tree = real_pathdup(work_tree_);
+
+   /* Write .git file */
+   write_file(gitfile_sb.buf, "gitdir: %s",
+  relative_path(git_dir, work_tree, _path));
/* Update core.worktree setting */
-   strbuf_reset(_name);
-   strbuf_addf(_name, "%s/config", git_dir);
-   git_config_set_in_file(file_name.buf, "core.worktree",
+   git_config_set_in_file(cfg_sb.buf, "core.worktree",
   relative_path(work_tree, git_dir, _path));
 
-   strbuf_release(_name);
+   strbuf_release(_sb);
+   strbuf_release(_sb);
strbuf_release(_path);
free(work_tree);
free(git_dir);
diff --git a/submodule.c b/submodule.c
index 0e55372f37..04d185738f 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1442,8 +1442,6 @@ void absorb_git_dir_into_superproject(const char *prefix,
 
/* Not populated? */
if (!sub_git_dir) {
-   char *real_new_git_dir;
-   const char *new_git_dir;
const struct submodule *sub;
 
if (err_code == READ_GITFILE_ERR_STAT_FAILED) {
@@ -1466,13 +1464,8 @@ void absorb_git_dir_into_superproject(const char *prefix,
sub = submodule_from_path(null_sha1, path);
if (!sub)
die(_("could not lookup name for submodule '%s'"), 
path);
-   new_git_dir = git_path("modules/%s", sub->name);
-   if (safe_create_leading_directories_const(new_git_dir) < 0)
-   die(_("could not create directory '%s'"), new_git_dir);
-   real_new_git_dir = real_pathdup(new_git_dir);
-   connect_work_tree_and_git_dir(path, real_new_git_dir);
-
-   free(real_new_git_dir);
+   connect_work_tree_and_git_dir(path,
+   git_path("modules/%s", sub->name));
} else {
/* Is it already absorbed into the superprojects git dir? */
char *real_sub_git_dir = real_pathdup(sub_git_dir);
-- 
2.12.0.rc1.52.ge239d7e709.dirty

[PATCH 03/18] lib-submodule-update: teach test_submodule_content the -C flag

Signed-off-by: Stefan Beller 
---
 t/lib-submodule-update.sh | 5 +
 1 file changed, 5 insertions(+)

diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh
index c0d6325133..00128f28b5 100755
--- a/t/lib-submodule-update.sh
+++ b/t/lib-submodule-update.sh
@@ -193,6 +193,11 @@ test_superproject_content () {
 # Test that the given submodule at path "$1" contains the content according
 # to the submodule commit recorded in the superproject's commit "$2"
 test_submodule_content () {
+   if test x"$1" = "x-C"
+   then
+   cd "$2"
+   shift; shift;
+   fi
if test $# != 2
then
echo "test_submodule_content needs two arguments"
-- 
2.12.0.rc1.52.ge239d7e709.dirty

[PATCH 10/18] submodules: introduce check to see whether to touch a submodule

In later patches we introduce the --recurse-submodule flag for commands
that modify the working directory, e.g. git-checkout.

It is potentially expensive to check if a submodule needs an update,
because a common theme to interact with submodules is to spawn a child
process for each interaction.

So let's introduce a function that checks if a submodule needs
to be checked for an update before attempting the update.

Signed-off-by: Stefan Beller 
---
 submodule.c | 16 
 submodule.h |  7 +++
 2 files changed, 23 insertions(+)

diff --git a/submodule.c b/submodule.c
index 591f4a694e..8b2c0212be 100644
--- a/submodule.c
+++ b/submodule.c
@@ -548,6 +548,22 @@ void set_config_update_recurse_submodules(int value)
config_update_recurse_submodules = value;
 }
 
+int should_update_submodules(void)
+{
+   return config_update_recurse_submodules == RECURSE_SUBMODULES_ON;
+}
+
+const struct submodule *submodule_from_ce(const struct cache_entry *ce)
+{
+   if (!S_ISGITLINK(ce->ce_mode))
+   return NULL;
+
+   if (!should_update_submodules())
+   return NULL;
+
+   return submodule_from_path(null_sha1, ce->name);
+}
+
 static int has_remote(const char *refname, const struct object_id *oid,
  int flags, void *cb_data)
 {
diff --git a/submodule.h b/submodule.h
index b4e60c08d2..6f3fe85c7c 100644
--- a/submodule.h
+++ b/submodule.h
@@ -65,6 +65,13 @@ extern void show_submodule_inline_diff(FILE *f, const char 
*path,
const struct diff_options *opt);
 extern void set_config_fetch_recurse_submodules(int value);
 extern void set_config_update_recurse_submodules(int value);
+/* Check if we want to update any submodule.*/
+extern int should_update_submodules(void);
+/*
+ * Returns the submodule struct if the given ce entry is a submodule
+ * and it should be updated. Returns NULL otherwise.
+ */
+extern const struct submodule *submodule_from_ce(const struct cache_entry *ce);
 extern void check_for_new_submodule_commits(unsigned char new_sha1[20]);
 extern int fetch_populated_submodules(const struct argv_array *options,
   const char *prefix, int command_line_option,
-- 
2.12.0.rc1.52.ge239d7e709.dirty

Re: [PATCH 03/18] lib-submodule-update: teach test_submodule_content the -C flag

On Mon, Mar 6, 2017 at 12:30 PM, Stefan Beller  wrote:
>
> as well as another fix in git-submodule.sh
>

I spoke too early, that seems like a false positive.

Re: bisect-helper: we do not bisect --objects

"Philip Oakley"  writes:

> The study of human error is quite interesting

Yes ;-)

Re: bisect-helper: we do not bisect --objects

Christian Couder  writes:

> I think I just copy pasted the code from cmd_rev_list() in
> builtin-rev-list.c and probably didn't realize that revs->tree_objects
> would always be false.
>
> Thanks for spotting this and removing the dead code.

Thanks for a quick confirmation. Just FYI I was looking at this code
because I was trying to assess how involved it would be to teach the
code to do a bisection only on the first-parent chain.

Re: [RFC 0/4] Shallow clones with on-demand fetch

On Mon, Mar 6, 2017 at 11:16 AM, Jonathan Tan  wrote:

> [1] <20170113155253.1644-1-benpe...@microsoft.com> (you can search for
> emails by Message ID in online archives like https://public-inbox.org/git if
> you don't already have them)

Not just search, but the immediate lookup is
 https://public-inbox.org/git/
so
 https://public-inbox.org/git/20170113155253.1644-1-benpe...@microsoft.com

> [2] 

and
  https://public-inbox.org/git/cover.1487984670.git.jonathanta...@google.com

Re: RFC: Another proposed hash function transition plan

On 03/06, Linus Torvalds wrote:
> On Mon, Mar 6, 2017 at 10:39 AM, Jonathan Tan  
> wrote:
> >
> > I think "nohash" can be explained in 2 points:
> 
> I do think that that was my least favorite part of the suggestion. Not
> just "nohash", but all the special "hash" lines too.
> 
> I would honestly hope that the design should not be about "other
> hashes". If you plan your expectations around the new hash being
> broken, something is wrong to begin with.
> 
> I do wonder if things wouldn't be simpler if the new format just
> included the SHA1 object name in the new object. Put it in the
> "header" line of the object, so that every time you look up an object,
> you just _see_ the SHA1 of that object. You can even think of it as an
> additional protection.
> 
> Btw, the multi-collision attack referenced earlier does _not_ work for
> an iterated hash that has a bigger internal state than the final hash.
> Which is actually a real argument against sha-256: the internal state
> of sha-256 is 256 bits, so if an attack can find collisions due to
> some weakness, you really can then generate exponential collisions by
> chaining a linear collision search together.
> 
> But for sha3-256 or blake2, the internal hash state is larger than the
> final hash, so now you need to generate collisions not in the 256
> bits, but in the much larger search space of the internal hash space
> if you want to generate those exponential collisions.
> 
> So *if* the new object format uses a git header line like
> 
> "blob  \0"
> 
> then it would inherently contain that mapping from 256-bit hash to the
> SHA1, but it would actually also protect against attacks on the new
> hash. In fact, in particular for objects with internal format that
> differs between the two hashing models (ie trees and commits which to
> some degree are higher-value targets), it would make attacks really
> quite complicated, I suspect.
> 
> And you wouldn't need those "hash" or "nohash" things at all. The old
> SHA1 would simply always be there, and cheap to look up (ie you
> wouldn't have to unpack the whole object).
> 
> Hmm?

I'll agree that the "hash" "nohash" bit isn't my favorite and is really
only there to address the signing of tags/commits in this new non-sha1
world.  I'm inclined to take a closer look at Jeff's suggestion which
simply has a signature for the hash that the signer cares about.

I don't know if keeping around the SHA1 for every object buys you all
that much.  It would add an additional layer of protection but you would
also need to compute the SHA1 for each object indefinitely (assuming you
include the SHA1 in new objects and not just converted objects).  The
hope would be that at some point you could not worry about SHA1 at all.
That may be difficult for projects with long history with commit msgs
which reference SHA1's of other commits (if you wanted to look up the
referenced commit, for example), but projects started in the new
non-sha1 world shouldn't have to ever compute a sha1.

Also, during this transition phase you would still need to maintain the
sha1<->sha256 translation table to make looking up objects by their sha1
name in a sha256 repo fast.  Otherwise I think it would take a
non-trivial amount of time to search a sha256 repo for a sha1 name.  So
if you do include the sha1 in the new object format then you would end
up with some duplicate information, which isn't the end of the world.

-- 
Brandon Williams

[Request for Documentation] Differentiate signed (commits/tags/pushes)

When discussing migrating to a new hashing function, I tried to learn
about the subtleties of the different things that can be gpg-signed in
Git.

What is the difference between signed commits and tags?
(Not from a technical perspective, but for the end user)

Both of them certify that a given tree is the desired
content for your repository, but git-tag seems to be designed for
conveying a trusted state to collaborators, whereas git-commit
is primarily designed to just record a state.

So in which case would I want to use a signed commit?
(which then overlaps with signed pushes)

A signed push can certify that a given payload (consisting
of multiple commits on possibly multiple branches) was transmitted
to a remote, which can be recorded by the remote as e.g. a proof
of work.

The man page of git-commit doesn't discuss gpg signing at all,
git-tag has some discussion on how to properly use tags. Maybe
there we could have a discussion on when not to use tags, but
rather signed commits?

Off list I was told gpg-signed commits are a "checkbox feature",
i.e. no real world workflow would actually use it. (That's a bold
statement, someone has to use it as there was enough interest
to implement it, no?)

Thanks,
Stefan

Re: [RFC 0/4] Shallow clones with on-demand fetch

Mark Thomas  writes:

> This is a proof-of-concept, so it is in no way complete.  It contains a
> few hacks to make it work, but these can be ironed out with a bit more
> work.  What I have so far is sufficient to try out the idea.

Two things that immediately come to mind (which may or may not be
real issues) are 

 (1) What (if any) security model you have in mind.

 From object-confidentiality's point of view, this needs to be
 enabled only on a host that allows
 uploadpack.allowAnySHA1InWant but even riskier.

 From DoS point of view, you can make a short 40-byte request to
 cause the other side emit megabytes of stuff.  I do not think
 it is a new problem (anybody can repeatedly request a clone of
 large stuff), but there may be new ramifications.

 (2) If the interface to ask just one object kills the whole idea
 due to roundtrip latency.

 You may want to be able to say "I want all objects reachable
 from this tree; please give me a packfile of needed objects
 assuming that I have all objects reachable from this other tree
 (or these other trees)".

Re: [RFC 0/4] Shallow clones with on-demand fetch

2017-03-06 Thread Jonathan Tan


On 03/04/2017 11:18 AM, Mark Thomas wrote:

I was inspired a bit by Microsoft's announcement of their Git VFS.  I
saw that people have talked in the past about making git fetch objects
from remotes as they are needed, and decided to give it a try.


For reference, one such conversation is [1]. (cc-ing Ben Peart also)


The patch series adds a "--on-demand" option to git clone, which, when
used in conjunction with the existing shallow clone operations, clones
the full history of the repository's commits, but only the files that
would be included in the shallow clone.

When a file that is missing is required, git requests the file on-demand
from the remote, via a new 'upload-file' service.


A reachability check (of the blob) might be a good idea. The current Git 
implementation already supports fetching a blob (perhaps a bug) but has 
problems with reachability calculations that I tried to fix in [2], but 
found some bugs that weren't easily fixable.


As I said in [2], I think that proper fetching of blobs on demand is a 
prerequisite to any sort of missing object tolerance (like your 
on-demand clones), so I haven't thought much about the topics in the 
rest of your patch set.


[1] <20170113155253.1644-1-benpe...@microsoft.com> (you can search for 
emails by Message ID in online archives like 
https://public-inbox.org/git if you don't already have them)

[2]

Re: RFC: Another proposed hash function transition plan

2017-03-06 Thread Linus Torvalds

On Mon, Mar 6, 2017 at 10:39 AM, Jonathan Tan  wrote:
>
> I think "nohash" can be explained in 2 points:

I do think that that was my least favorite part of the suggestion. Not
just "nohash", but all the special "hash" lines too.

I would honestly hope that the design should not be about "other
hashes". If you plan your expectations around the new hash being
broken, something is wrong to begin with.

I do wonder if things wouldn't be simpler if the new format just
included the SHA1 object name in the new object. Put it in the
"header" line of the object, so that every time you look up an object,
you just _see_ the SHA1 of that object. You can even think of it as an
additional protection.

Btw, the multi-collision attack referenced earlier does _not_ work for
an iterated hash that has a bigger internal state than the final hash.
Which is actually a real argument against sha-256: the internal state
of sha-256 is 256 bits, so if an attack can find collisions due to
some weakness, you really can then generate exponential collisions by
chaining a linear collision search together.

But for sha3-256 or blake2, the internal hash state is larger than the
final hash, so now you need to generate collisions not in the 256
bits, but in the much larger search space of the internal hash space
if you want to generate those exponential collisions.

So *if* the new object format uses a git header line like

"blob  \0"

then it would inherently contain that mapping from 256-bit hash to the
SHA1, but it would actually also protect against attacks on the new
hash. In fact, in particular for objects with internal format that
differs between the two hashing models (ie trees and commits which to
some degree are higher-value targets), it would make attacks really
quite complicated, I suspect.

And you wouldn't need those "hash" or "nohash" things at all. The old
SHA1 would simply always be there, and cheap to look up (ie you
wouldn't have to unpack the whole object).

Hmm?

   Linus

Re: [PATCH v3] Travis: also test on 32-bit Linux

Lars Schneider  writes:

> when I looked into the 32-bit/line-log issue [1], I realized that my
> proposed docker setup is not ideal for local debugging. Here is an
> approach that I think is better. I changed the following:
> - disable sudo as it is not required for the Travis job
> - keep all docker commands in the .travis.yml
> - add option to run commands inside docker with the same UID as the
>   host user to make output files accessible
> - pass environment variables directly to the docker container
>
> Sorry for the back and forth.

Anything that makes further diag easier when a problem is spotted by
the automated machinery is a very welcome change.

Will replace; let's see what others think.

Thanks.

Re: [PATCH 01/10] submodule: decouple url and submodule existence

On 03/01, Stefan Beller wrote:

Sorry I've been slow at rerolling this.  I'll send out a reroll today.

> IIRC most of the series is actually refactoring, i.e.
> providing a central function that answers
> "is this submodule active/initialized?" and then making use of this
> function.
> 
> Maybe it would be easier to reroll these refactorings first without adding
> in the change of behavior.

I agree, when I resend out the series I can drop the first patch so that
the series as a whole just moves to using a standardized function to
check that a submodule is active.  I can then resend out the first patch
in the series on its own so that we can more easily discuss the best
route to take.

> 
> Off the list we talked about different models how to indicate that
> a submodule is active.
> 
> Current situation:
> submodule..URL is used as a boolean flag
> 
> Possible future A:
> 1) If submodule.active exists use that
> 2) otherwise go be individual .URL configs
> 
> submodule.active is a config that contains a pathspec,
> i.e. specifies the group of submodules instead of marking
> each submodule individually.
> 
> How to get to future A:
> * apply this patch series
> 
> Possible future B:
> 1) the boolean submodule..exists is used
> if existent
> 2) If that boolean is missing we'd respect a grouping
>feature such as submodule.active
> 3) fallback to the .URL as a boolean indicator.

Thinking on this, and to maintain simplicity and uniformity, we could
have the per-submodule boolean flag to be "submodule..active"
while the general pathspec based config option would be
"submodule.active".

> 
> How to get to future B:
> 1) possibly take the refactoring part from this series
> 2) implement the .exists flag (that can solve the worktree
>   problem, as then we don't have to abuse the .URL
>   as a boolean indicator any more.)
> 3) implement the grouping feature that takes precedence
>   over .URL, but yields precedence to the .exists flag.
>   (This would solve the grouping problem)
> 
> By having two different series (2) and (3) we'd solve
> just one thing at a time with each series, such that
> this seems easier to understand and review.
> 
> The question is if this future B is also easier to use for
> users and I think it would be, despite being technically more
> complex.
> 
> Most users only have a few submodules. Also the assumption
> is to have either interest in very few specific submodules
> or all of them. For both of these cases the .exists is a good idea
> as it is very explicit, but verbose.
> 
> The grouping feature would help in the case of lots of
> submodules, e.g. to select all of them you would just give
> an easy pathspec "." and that would help going forward
> as by such a submodule spec all new submodules would
> be "auto-on".

And if you wanted all but one you could have submodule.active="." while
submodule.dontwant.active=false instead of having a multi value
submodule.active with an exclusion pathspec.  Both work but one may be
easier for a user to understand.

> 
> Possible future C:
> Similar to B, but with branch specific configuration.
> submodule..active would work as a
> grouping feature. I wonder how it would work with
> the .exists flag.
> 
> Stefan

-- 
Brandon Williams

Re: RFC: Another proposed hash function transition plan

2017-03-06 Thread Jonathan Tan


On 03/06/2017 12:43 AM, Jeff King wrote:

Overall the basics of the conversion seem sound to me. The "nohash"
things seems more complicated than I think it ought to be, which
probably just means I'm missing something.  I left a few related
comments on the google doc, so I won't repeat them here.


I think "nohash" can be explained in 2 points:
 1. When creating signed objects, "nohash" is almost never written. Just
create the object as usual and add "hash" lines for every other hash
function that you want the signature to cover.
 2. When converting from function A to function B, add "nohash B" if
there were no "hash B" lines in the original object.

The "nohash" thing was in the hope of requiring only one signature to 
sign all the hashes (in all the functions) that the user wants, while 
preserving round-tripping ability.


Maybe some examples would help to address the apparent complexity. These 
examples are the same as those in the document. I'll also show future 
compatibility with a hypothetical NEW hash function, and extend the rule 
about signing/verification to 'sign in the earliest supported hash 
function in ({object's hash function} + {functions in "hash" lines} - 
{function in "nohash" line})'.


Example 1 (existing signed commit)

  nohash sha256  nohash new
  hash sha1 ...  hash sha1 ...

This object was probably created in a SHA-1 repository with no knowledge 
that we were going to transition to SHA256 (but there is nothing 
preventing us from creating the middle or right object and then 
translating it to the other functions).


Example 2 (recommended way to sign a commit in a SHA256 repo)

hash sha256 ...   hash sha1 ...  nohash new
 hash sha1 ...
 hash sha256 ...

This is the recommended way to create a SHA256 object in a SHA256 repo. 
The rule about signing/verification (as stated above) is to sign in 
SHA-1, so when signing or verifying, we convert the object to SHA-1 and 
use that as the payload. Note that the signature covers both the SHA-1 
and SHA256 hashes, and that existing Git implementations can verify the 
signature.


Example 3 (a signer that does not care about SHA-1 anymore)

nohash sha1  nohash new
hash sha256 ...  hash sha256 ...

If we were to create a SHA256 object without any mentions of SHA-1, the 
rule about signing/verification (as stated above) states that the 
signature payload is the SHA256 object. This means that existing Git 
implementations cannot verify the signature, but we can still round-trip 
to SHA-1 and back without losing any information (as far as I can tell).

Re: RFC: Another proposed hash function transition plan

Jeff King  writes:

>> You can use the doc URL
>> 
>>  https://goo.gl/gh2Mzc
>
> I'd encourage anybody following along to follow that link. I almost
> didn't, but there are a ton of comments there (I'm not sure how I feel
> about splitting the discussion off the list, though).

I am sure how I feel about it---we should really discourage it,
unless it is an effort to help polishing an early draft for wider
distribution and discussion.

> I don't think we do this right now, but you can actually find the entry
> (and exit) points of a pack during the index-pack step. Basically:

We have code to do the "entry point" computation in index-pack
already, I think, in 81a04b01 ("index-pack: --clone-bundle option",
2016-03-03).

> I don't think using the "want"s as the entry points is unreasonable,
> though. The server _shouldn't_ generally be sending us other cruft.

That's true.

Re: RFC: Another proposed hash function transition plan

On 03/06, brian m. carlson wrote:
> On Sat, Mar 04, 2017 at 06:35:38PM -0800, Linus Torvalds wrote:
> > On Fri, Mar 3, 2017 at 5:12 PM, Jonathan Nieder  wrote:
> > >
> > > This document is still in flux but I thought it best to send it out
> > > early to start getting feedback.
> > 
> > This actually looks very reasonable if you can implement it cleanly
> > enough. In many ways the "convert entirely to a new 256-bit hash" is
> > the cleanest model, and interoperability was at least my personal
> > concern. Maybe your model solves it (devil in the details), in which
> > case I really like it.
> 
> If you think you can do it, I'm all for it.
> 
> > Btw, I do think the particular choice of hash should still be on the
> > table. sha-256 may be the obvious first choice, but there are
> > definitely a few reasons to consider alternatives, especially if it's
> > a complete switch-over like this.
> > 
> > One is large-file behavior - a parallel (or tree) mode could improve
> > on that noticeably. BLAKE2 does have special support for that, for
> > example. And SHA-256 does have known attacks compared to SHA-3-256 or
> > BLAKE2 - whether that is due to age or due to more effort, I can't
> > really judge. But if we're switching away from SHA1 due to known
> > attacks, it does feel like we should be careful.
> 
> I agree with Linus on this.  SHA-256 is the slowest option, and it's the
> one with the most advanced cryptanalysis.  SHA-3-256 is faster on 64-bit
> machines (which, as we've seen on the list, is the overwhelming majority
> of machines using Git), and even BLAKE2b-256 is stronger.
> 
> Doing this all over again in another couple years should also be a
> non-goal.

I agree that when we decide to move to a new algorithm that we should
select one which we plan on using for as long as possible (much longer
than a couple years).  While writing the document we simply used
"sha256" because it was more tangible and easier to reference.

> -- 
> brian m. carlson / brian with sandals: Houston, Texas, US
> +1 832 623 2791 | https://www.crustytoothpaste.net/~bmc | My opinion only
> OpenPGP: https://keybase.io/bk2204



-- 
Brandon Williams

Re: [PATCH v2] http: inform about alternates-as-redirects behavior

On 03/04, Jeff King wrote:
> On Sat, Mar 04, 2017 at 08:36:45AM +, Eric Wong wrote:
> 
> > I also think the security implications for relative alternates
> > on the same host would not matter, since the smart HTTP will
> > take them into account on the server side.
> 
> It depends on the host whether all of the repos on it have the same
> security domain or not. A site like github.com hosts both public and
> private repositories, and you do not want a public repo redirecting to
> the private one to get objects.
> 
> Of course, that depends on untrusted users being able to configure
> server-side alternates, which GitHub certainly would not let you do. I
> would hope other multi-user hosting sites behave similarly (most hosting
> sites do not seem to allow dumb http at all).
> 
> > Perhaps we give http_follow_config ORable flags:
> > 
> > HTTP_FOLLOW_NONE = 0,
> > HTTP_FOLLOW_INITIAL = 0x1,
> > HTTP_FOLLOW_RELATIVE = 0x2,
> > HTTP_FOLLOW_ABSOLUTE = 0x4,
> > HTTP_FOLLOW_ALWAYS = 0x7,
> > 
> > With the default would being: HTTP_FOLLOW_INITIAL|HTTP_FOLLOW_RELATIVE
> > (but I suppose that's a patch for another time)
> 
> I don't have a real problem with breaking it down that way, if somebody
> wants to make a patch. Mostly the reason I didn't do so is that I don't
> think http-alternates are in common use these days, since smart-http is
> much more powerful.
> 
> > --8<---
> > From: Eric Wong 
> > Subject: [PATCH] http: inform about alternates-as-redirects behavior
> 
> This v2 looks fine to me.
> 
> -Peff

I know I'm a little late to the party but v2 looks good to me too.  I
like the change from v1 that only mentions the config option as opposed
to listing a value it should be set to.

-- 
Brandon Williams

Re: git init --separate-git-dir does not update symbolic .git links for submodules

On Sat, Mar 4, 2017 at 6:15 AM, Valery Tolstov  wrote:
> Looking for microproject ideas for GSoC.
> Would this issue be suitable as the microproject?

It would be a good project, but not as 'micro' I would assume. ;)
Why it is not a micro project:

To fix this issue we'd want to fix the .git link files recursively, i.e.
nested submodules would also need fixing. And to recurse into
submodules we normally spawn a child process.

So maybe the plan is as follow:
1) Add the functionality to builtin/submodule--helper.c
it fixes just one submodule, given by name.
2) add an externally visible command in the same file
  See e.g. module_name (the function and the table
  at the very end of the file)
  (1) needs to call this command (see clone_submodule() in
that file how to call further commands)

2) "git init --separate-git-dir ../test" would call the function
  in (1), which then recursively uses (2) to call (1) in nested
  submodules.

After thinking about it, it may be good for a micro project,
as it is smaller than originally assumed.
Specifically if you plan to look into submodules later on.

Thanks,
Stefan

fatal: Could not get current working directory: Permission denied | affected 2.10,2.11,2.12, but not 1.9.5 |

2017-03-06 Thread Zenobiusz Kunegunda

OS: FreeBSD 10.3-STABLE

Story:
I was trying to install openproject using this manual
https://www.openproject.org/open-source/download/manual-installation-guide/

Everything was fine till command
$ bundle install --deployment --without postgres sqlite development test 
therubyracer docker

works witg git version: 
1.9.5 ( branch from repo )
does not work with git version: 
2.10 ( branch from from repo ) 
2.11 ( both from FreeBSD and from git repository) 
2.12 ( branch from repo )

On another server that passed but there was npm problem.

This is error for 
$ bundle install --deployment --without postgres sqlite development test 
therubyracer docker

=== output begin ===
Error message (this is from git 2.11; 2.10 and 2.12 report same error):
$ bundle install --deployment --without postgres sqlite development test 
therubyracer docker 
The git source `git://github.com/oliverguenther/omniauth.git` uses the `git` 
protocol, which transmits data without encryption. Disable this warning with 
`bundle config git.allow_insecure true`, or switch to the `https` protocol to 
keep your data secure.
The git source `git://github.com/finnlabs/awesome_nested_set.git` uses the 
`git` protocol, which transmits data without encryption. Disable this warning 
with `bundle config git.allow_insecure true`, or switch to the `https` protocol 
to keep your data secure.
The git source `git://github.com/why-el/svg-graph.git` uses the `git` protocol, 
which transmits data without encryption. Disable this warning with `bundle 
config git.allow_insecure true`, or switch to the `https` protocol to keep your 
data secure.
The git source `git://github.com/opf/rails-angular-xss.git` uses the `git` 
protocol, which transmits data without encryption. Disable this warning with 
`bundle config git.allow_insecure true`, or switch to the `https` protocol to 
keep your data secure.
The git source `git://github.com/goodwill/capybara-select2.git` uses the `git` 
protocol, which transmits data without encryption. Disable this warning with 
`bundle config git.allow_insecure true`, or switch to the `https` protocol to 
keep your data secure.
The git source `git://github.com/omniauth/omniauth-saml.git` uses the `git` 
protocol, which transmits data without encryption. Disable this warning with 
`bundle config git.allow_insecure true`, or switch to the `https` protocol to 
keep your data secure.
Fetching gem metadata from https://rubygems.org/...
Fetching version metadata from https://rubygems.org/..
Fetching dependency metadata from https://rubygems.org/.
fatal: Could not get current working directory: Permission denied

Retrying `git fetch --force --quiet --tags 
"/usr/home/USER/openproject/vendor/bundle/ruby/2.2.0/cache/bundler/git/awesome_nested_set-209215f38dc7f6765d32201897f8688e973f4de7"`
 due to error (2/4): Bundler::Source::Git::GitCommandError Git error: command 
`git fetch --force --quiet --tags 
"/usr/home/USER/openproject/vendor/bundle/ruby/2.2.0/cache/bundler/git/awesome_nested_set-209215f38dc7f6765d32201897f8688e973f4de7"`
 in directory 
/usr/home/USER/openproject/vendor/bundle/ruby/2.2.0/bundler/gems/awesome_nested_set-7bd473e845e2
 has failed.
If this error persists you could try removing the cache directory 
'/usr/home/USER/openproject/vendor/bundle/ruby/2.2.0/cache/bundler/git/awesome_nested_set-209215f38dc7f6765d32201897f8688e973f4de7'fatal:
 Could not get current working directory: Permission denied

Retrying `git fetch --force --quiet --tags 
"/usr/home/USER/openproject/vendor/bundle/ruby/2.2.0/cache/bundler/git/awesome_nested_set-209215f38dc7f6765d32201897f8688e973f4de7"`
 due to error (3/4): Bundler::Source::Git::GitCommandError Git error: command 
`git fetch --force --quiet --tags 
"/usr/home/USER/openproject/vendor/bundle/ruby/2.2.0/cache/bundler/git/awesome_nested_set-209215f38dc7f6765d32201897f8688e973f4de7"`
 in directory 
/usr/home/USER/openproject/vendor/bundle/ruby/2.2.0/bundler/gems/awesome_nested_set-7bd473e845e2
 has failed.
If this error persists you could try removing the cache directory 
'/usr/home/USER/openproject/vendor/bundle/ruby/2.2.0/cache/bundler/git/awesome_nested_set-209215f38dc7f6765d32201897f8688e973f4de7'fatal:
 Could not get current working directory: Permission denied

Retrying `git fetch --force --quiet --tags 
"/usr/home/USER/openproject/vendor/bundle/ruby/2.2.0/cache/bundler/git/awesome_nested_set-209215f38dc7f6765d32201897f8688e973f4de7"`
 due to error (4/4): Bundler::Source::Git::GitCommandError Git error: command 
`git fetch --force --quiet --tags 
"/usr/home/USER/openproject/vendor/bundle/ruby/2.2.0/cache/bundler/git/awesome_nested_set-209215f38dc7f6765d32201897f8688e973f4de7"`
 in directory 
/usr/home/USER/openproject/vendor/bundle/ruby/2.2.0/bundler/gems/awesome_nested_set-7bd473e845e2
 has failed.
If this error persists you could try removing the cache directory

Re: Fwd: HTTP Dumb Push?

2017-03-06 Thread Jeff King

On Mon, Mar 06, 2017 at 09:38:33AM -0600, Christian7573 wrote:

> Is there a way for git to push to a server using the dumb protocol?
> Perhaps using the PUT method to upload files?
> Just wondering cause I just set up a dumb server and pushing to it
> doesn't work.

Git dumb-http push protocol goes over DAV. There are instructions in
Documentation/howto/setup-git-server-over-http.txt of the git
repository. They're quite old now, and I have no idea if they've
bit-rotted over the years. Hardly anybody uses that solution these days
(and it's much less efficient than the smart protocol).

> If that isn't an option, where can I get a smart server?

The CGI ships with the rest of git.  Try "git help http-backend", which
has some sample config for Apache and lighttpd.

-Peff

Fwd: HTTP Dumb Push?

2017-03-06 Thread Christian7573

Is there a way for git to push to a server using the dumb protocol?
Perhaps using the PUT method to upload files?
Just wondering cause I just set up a dumb server and pushing to it
doesn't work. If that isn't an option, where can I get a smart server?

Re: Delta compression not so effective

2017-03-06 Thread Marius Storm-Olsen


On 3/5/2017 19:14, Linus Torvalds wrote:

On Sat, Mar 4, 2017 at 12:27 AM, Marius Storm-Olsen  wrote:
I guess you could do the printout a bit earlier (on the
"to_pack.objects[]" array - to_pack.nr_objects is the count there).
That should show all of them. But the small objects shouldn't matter.

But if you have a file like

   extern/win/FlammableV3/x64/lib/FlameProxyLibD.lib

I would have assumed that it has a size that is > 50. Unless those
"extern" things are placeholders?


No placeholders, the FlameProxyLibD.lib is a debug lib, and probably the 
largest in the whole repo (with a replace count > 5).




I do wonder if your dll data just simply is absolutely horrible for
xdelta. We've also limited the delta finding a bit, simply because it
had some O(m*n) behavior that gets very expensive on some patterns.
Maybe your blobs trigger some of those case.


Ok, but given that the SVN delta compression, which forward-linear only, 
is ~45% better, perhaps that particular search could be done fairly 
cheap? Although, I bet time(stamps) are out of the loop at that point, 
so it's not a factor anymore. Even if it where, I'm not sure it would 
solve anything, if there's other factors also limiting deltafication.




The diff-delta work all goes back to 2005 and 2006, so it's a long time ago.

What I'd ask you to do is try to find if you could make a reposity of
just one of the bigger DLL's with its history, particularly if you can
find some that you don't think is _that_ sensitive.

Looking at it, for example, I see that you have that file

   extern/redhat-5/FlammableV3/x64/plugins/libFlameCUDA-3.0.703.so

that seems to have changed several times, and is a largish blob. Could
you try creating a repository with git fast-import that *only*
contains that file (or pick another one), and see if that delta's
well?


I'll filter-branch to extern/ only, however the whole FlammableV3 needs 
to go too, I'm afaid (extern for that project, but internal to $WORK).

I'll do some rewrites and see what comes up.


And if you find some case that doesn't xdelta well, and that you feel
you could make available outside, we could have a test-case...


I'll try with this repo first, if not, I'll see if I can construct one.

Thanks!


--
.marius

Re: [PATCH v5 23/24] t1405: some basic tests on main ref store

2017-03-06 Thread Duy Nguyen

On Fri, Mar 3, 2017 at 11:43 PM, Michael Haggerty  wrote:
> It's notable that these tests grep around the filesystem, so they won't
> be applicable to future refs backends. Of course, "pack-refs" is
> intrinsically only applicable to the files backend, so for this test
> it's not surprising. But some tests could conceivably be written in a
> generic way, so that they should pass for any refs backend.
>
> Just food for thought; no need to change anything now.

I'm a bit on the fence about this. On one hand I think there is room
to backend-specific tests (and this one is more about files backend,
we just don't have any direct way of getting the backend except
through get_main_ref_store()).

On the other hand, I can see a need for verifying refs behavior across
backends. Submodule backend is unfortunately a bad fit (and probably
worktree backend too in early phase) because it cannot fully replace
files backend. lmdb does. I guess these tests will have some more
restructuring when lmdb joins the party.

I imagine we could have something like "ref_expect_success
[backend,[backend..]]  ", which makes it easier to
exercise a new backend with the same test, or we could add
backend-specific tests as well. Not sure how to do it yet (the devil
will be in the body, I think, like dealing with "git -C sub" for
submodules). Probably won't do in this series anyway.

>> +test_expect_success 'rename_refs(master, new-master)' '
>> + git rev-parse master >expected &&
>> + $RUN rename-ref refs/heads/master refs/heads/new-master &&
>> + git rev-parse new-master >actual &&
>> + test_cmp expected actual &&
>> + test_commit recreate-master
>> +'
>
> Isn't HEAD set to `refs/heads/master` prior to this test? If so, then I
> think it should become detached when you rename `refs/heads/master`. Or
> maybe it should be changed to point at `refs/heads/new-master`, I can't
> remember. In either case, it might be useful for the test to check that
> the behavior matches the status quo, so we notice if the behavior ever
> changes inadvertently.

You had me worried a bit there, that I broke something. Yes we rename
HEAD too. No it's not the backend's job. It's done separately by
builtin/branch.c. Probably a good thing because I don't the backend
should know about HEAD's special semantics. Front-end might though.

>> +test_expect_success 'delete_ref(refs/heads/foo)' '
>> + SHA1=`git rev-parse foo` &&
>> + git checkout --detach &&
>> + $RUN delete-ref refs/heads/foo $SHA1 0 &&
>> + test_must_fail git rev-parse refs/heads/foo --
>> +'
>
> The last two tests have the same name.
>
> You might also want to test these functions when the old oid is
> incorrect or when the reference is missing.

I will. But you probably noticed that I didn't cover refs behavior
extensively. I don't know this area well enough to write a conformant
test suite. But I think that's ok. We have something to start
improving now.
-- 
Duy

[PATCH v5 03/22] config: add git_config_get_split_index()

This new function will be used in a following commit to know
if we want to use the split index feature or not.

Signed-off-by: Christian Couder 
---
 cache.h  |  1 +
 config.c | 10 ++
 2 files changed, 11 insertions(+)

diff --git a/cache.h b/cache.h
index 80b6372cf7..03de80daae 100644
--- a/cache.h
+++ b/cache.h
@@ -1926,6 +1926,7 @@ extern int git_config_get_bool_or_int(const char *key, 
int *is_bool, int *dest);
 extern int git_config_get_maybe_bool(const char *key, int *dest);
 extern int git_config_get_pathname(const char *key, const char **dest);
 extern int git_config_get_untracked_cache(void);
+extern int git_config_get_split_index(void);
 
 /*
  * This is a hack for test programs like test-dump-untracked-cache to
diff --git a/config.c b/config.c
index 2ac1aa19b0..2a97696be7 100644
--- a/config.c
+++ b/config.c
@@ -1736,6 +1736,16 @@ int git_config_get_untracked_cache(void)
return -1; /* default value */
 }
 
+int git_config_get_split_index(void)
+{
+   int val;
+
+   if (!git_config_get_maybe_bool("core.splitindex", ))
+   return val;
+
+   return -1; /* default value */
+}
+
 NORETURN
 void git_die_config_linenr(const char *key, const char *filename, int linenr)
 {
-- 
2.12.0.206.g74921e51d6.dirty

[PATCH v5 01/22] config: mark an error message up for translation

Signed-off-by: Christian Couder 
---
 config.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/config.c b/config.c
index c6b874a7bf..2ac1aa19b0 100644
--- a/config.c
+++ b/config.c
@@ -1728,8 +1728,8 @@ int git_config_get_untracked_cache(void)
if (!strcasecmp(v, "keep"))
return -1;
 
-   error("unknown core.untrackedCache value '%s'; "
- "using 'keep' default value", v);
+   error(_("unknown core.untrackedCache value '%s'; "
+   "using 'keep' default value"), v);
return -1;
}
 
-- 
2.12.0.206.g74921e51d6.dirty

[PATCH v5 00/22] Add configuration options for split-index

Goal

We want to make it possible to use the split-index feature
automatically by just setting a new "core.splitIndex" configuration
variable to true.

This can be valuable as split-index can help significantly speed up
`git rebase` especially along with the work to libify `git apply`
that has been merged to master
(see https://github.com/git/git/commit/81358dc238372793b1590efa149cc1581d1fbd98)
and is now in v2.11.

Design
~~

The design is similar as the previous work that introduced
"core.untrackedCache".

The new "core.splitIndex" configuration option can be either true,
false or undefined which is the default.

When it is true, the split index is created, if it does not already
exists, when the index is read. When it is false, the split index is
removed if it exists, when the index is read. Otherwise it is left as
is.

Along with this new configuration variable, the two following options
are also introduced:

- splitIndex.maxPercentChange

This is to avoid having too many changes accumulating in the split
index while in split index mode. The git-update-index
documentation says:

If split-index mode is already enabled and `--split-index` is
given again, all changes in $GIT_DIR/index are pushed back to
the shared index file.

but it is probably better to not expect the user to think about it
and to have a mechanism that pushes back all changes to the shared
index file automatically when some threshold is reached.

The default threshold is when the number of entries in the split
index file reaches 20% of the number of entries in the shared
index file. The new "splitIndex.maxPercentChange" config option
lets people tweak this value.

- splitIndex.sharedIndexExpire

To make sure that old sharedindex files are eventually removed
when a new one has been created, we "touch" the shared index file
every time a split index file using the shared index file is
either created or read from. Then we can delete shared indexes
with an mtime older than two weeks (by default), when we create a
new shared index file. The new "splitIndex.sharedIndexExpire"
config option lets people tweak this grace period.

This idea was suggested by Duy in:

https://public-inbox.org/git/cacsjy8bqmfashf5kjguh+bd7xg98cafnyde964vjypxz-em...@mail.gmail.com/

and after some experiments, I agree that it is much simpler than
what I thought could be done during our discussion.

Junio also thinks that we have to do "time-based GC" in:

https://public-inbox.org/git/xmqqeg33ccjj@gitster.mtv.corp.google.com/

Note that this patch series doesn't address a leak when removing a
split-index, but Duy wrote that he has a patch to fix this leak:

https://public-inbox.org/git/CACsJy8AisF2ZVs7JdnVFp5wdskkbVQQQ=dbq5uze1moscfb...@mail.gmail.com/

Highlevel view of the patches in the series
~~~

There are very few and very small differences between this patch
series and the v4 patch series sent a few weeks ago. The small
differences are all in patches 15/22 and 17/22. The other patches are
the same as in v4.

Except for patch 1/22 and 2/22, there are 3 big steps, one for each
new configuration variable introduced.

- Patch 1/22 marks a message for translation and can be applied
separately.

- Patch 2/22 improves the existing indentation style of t1700 by
using different here document style.

Step 1 is:

- Patches 3/22 to 6/22 introduce the functions that are reading
the "core.splitIndex" configuration variable and tweaking the
split index depending on its value.

- Patch 7/22 adds a few tests for the new feature.

- Patches 8/22 and 9/22 add some documentation for the new
feature.

Step 2 is:

- Patches 10/22 and 11/22 introduce the functions that are reading
the "splitIndex.maxPercentChange" configuration variable and
regenerating a new shared index file depending on its value.

- Patch 12/22 adds a few tests for the new feature.

- Patch 13/22 add some documentation for the new feature.

Step 3 is:

- Patches 14/22 to 17/22 introduce the functions that are reading
the "splitIndex.sharedIndexExpire" configuration variable and
expiring old shared index files depending on its value.

In patch 15/22 a warning message starts with "could" instead of
"Could" as suggested by Junio.

In patch 17/22, can_delete_shared_index() has been renamed
should_delete_shared_index() and error_errno() has been changed
to warning_errno() as suggested by Junio.

- Patch 18/22 adds a few tests for the new feature.

- Patches 19/22 and 20/22 update the mtime of the shared index
file when a split index based on the shared index file is read
from. 19/22 is a refactoring to make the actual change in 20/22
easier.

- Patches 21/22 and 22/22 add some documentation for

[PATCH v5 05/22] read-cache: add and then use tweak_split_index()

This will make us use the split-index feature or not depending
on the value of the "core.splitIndex" config variable.

Signed-off-by: Christian Couder 
---
 read-cache.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/read-cache.c b/read-cache.c
index 9054369dd0..99bc274b8d 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1558,10 +1558,27 @@ static void tweak_untracked_cache(struct index_state 
*istate)
}
 }
 
+static void tweak_split_index(struct index_state *istate)
+{
+   switch (git_config_get_split_index()) {
+   case -1: /* unset: do nothing */
+   break;
+   case 0: /* false */
+   remove_split_index(istate);
+   break;
+   case 1: /* true */
+   add_split_index(istate);
+   break;
+   default: /* unknown value: do nothing */
+   break;
+   }
+}
+
 static void post_read_index_from(struct index_state *istate)
 {
check_ce_order(istate);
tweak_untracked_cache(istate);
+   tweak_split_index(istate);
 }
 
 /* remember to discard_cache() before reading a different cache! */
-- 
2.12.0.206.g74921e51d6.dirty

[PATCH v5 12/22] t1700: add tests for splitIndex.maxPercentChange

Signed-off-by: Christian Couder 
---
 t/t1700-split-index.sh | 72 ++
 1 file changed, 72 insertions(+)

diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index 9d7c01c3e1..00a64bed97 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -254,4 +254,76 @@ test_expect_success 'set core.splitIndex config variable 
to false' '
test_cmp expect actual
 '
 
+test_expect_success 'set core.splitIndex config variable to true' '
+   git config core.splitIndex true &&
+   : >three &&
+   git update-index --add three &&
+   BASE=$(test-dump-split-index .git/index | grep "^base") &&
+   test-dump-split-index .git/index | sed "/^own/d" >actual &&
+   cat >expect <<-EOF &&
+   $BASE
+   replacements:
+   deletions:
+   EOF
+   test_cmp expect actual &&
+   : >four &&
+   git update-index --add four &&
+   test-dump-split-index .git/index | sed "/^own/d" >actual &&
+   cat >expect <<-EOF &&
+   $BASE
+   100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0   four
+   replacements:
+   deletions:
+   EOF
+   test_cmp expect actual
+'
+
+test_expect_success 'check behavior with splitIndex.maxPercentChange unset' '
+   git config --unset splitIndex.maxPercentChange &&
+   : >five &&
+   git update-index --add five &&
+   BASE=$(test-dump-split-index .git/index | grep "^base") &&
+   test-dump-split-index .git/index | sed "/^own/d" >actual &&
+   cat >expect <<-EOF &&
+   $BASE
+   replacements:
+   deletions:
+   EOF
+   test_cmp expect actual &&
+   : >six &&
+   git update-index --add six &&
+   test-dump-split-index .git/index | sed "/^own/d" >actual &&
+   cat >expect <<-EOF &&
+   $BASE
+   100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0   six
+   replacements:
+   deletions:
+   EOF
+   test_cmp expect actual
+'
+
+test_expect_success 'check splitIndex.maxPercentChange set to 0' '
+   git config splitIndex.maxPercentChange 0 &&
+   : >seven &&
+   git update-index --add seven &&
+   BASE=$(test-dump-split-index .git/index | grep "^base") &&
+   test-dump-split-index .git/index | sed "/^own/d" >actual &&
+   cat >expect <<-EOF &&
+   $BASE
+   replacements:
+   deletions:
+   EOF
+   test_cmp expect actual &&
+   : >eight &&
+   git update-index --add eight &&
+   BASE=$(test-dump-split-index .git/index | grep "^base") &&
+   test-dump-split-index .git/index | sed "/^own/d" >actual &&
+   cat >expect <<-EOF &&
+   $BASE
+   replacements:
+   deletions:
+   EOF
+   test_cmp expect actual
+'
+
 test_done
-- 
2.12.0.206.g74921e51d6.dirty

[PATCH v5 16/22] config: add git_config_get_expiry() from gc.c

This function will be used in a following commit to get the expiration
time of the shared index files from the config, and it is generic
enough to be put in "config.c".

Signed-off-by: Christian Couder 
---
 builtin/gc.c | 18 +++---
 cache.h  |  3 +++
 config.c | 13 +
 3 files changed, 19 insertions(+), 15 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index a2b9e8924e..56ab74f6ba 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -64,17 +64,6 @@ static void report_pack_garbage(unsigned seen_bits, const 
char *path)
string_list_append(_garbage, path);
 }
 
-static void git_config_date_string(const char *key, const char **output)
-{
-   if (git_config_get_string_const(key, output))
-   return;
-   if (strcmp(*output, "now")) {
-   unsigned long now = approxidate("now");
-   if (approxidate(*output) >= now)
-   git_die_config(key, _("Invalid %s: '%s'"), key, 
*output);
-   }
-}
-
 static void process_log_file(void)
 {
struct stat st;
@@ -131,10 +120,9 @@ static void gc_config(void)
git_config_get_int("gc.auto", _auto_threshold);
git_config_get_int("gc.autopacklimit", _auto_pack_limit);
git_config_get_bool("gc.autodetach", _auto);
-   git_config_date_string("gc.pruneexpire", _expire);
-   git_config_date_string("gc.worktreepruneexpire", 
_worktrees_expire);
-   git_config_date_string("gc.logexpiry", _log_expire);
-
+   git_config_get_expiry("gc.pruneexpire", _expire);
+   git_config_get_expiry("gc.worktreepruneexpire", 
_worktrees_expire);
+   git_config_get_expiry("gc.logexpiry", _log_expire);
git_config(git_default_config, NULL);
 }
 
diff --git a/cache.h b/cache.h
index a35e9d5187..65ab507a76 100644
--- a/cache.h
+++ b/cache.h
@@ -1932,6 +1932,9 @@ extern int git_config_get_untracked_cache(void);
 extern int git_config_get_split_index(void);
 extern int git_config_get_max_percent_split_change(void);
 
+/* This dies if the configured or default date is in the future */
+extern int git_config_get_expiry(const char *key, const char **output);
+
 /*
  * This is a hack for test programs like test-dump-untracked-cache to
  * ensure that they do not modify the untracked cache when reading it.
diff --git a/config.c b/config.c
index 35b6f02960..f20d7d88f7 100644
--- a/config.c
+++ b/config.c
@@ -1712,6 +1712,19 @@ int git_config_get_pathname(const char *key, const char 
**dest)
return ret;
 }
 
+int git_config_get_expiry(const char *key, const char **output)
+{
+   int ret = git_config_get_string_const(key, output);
+   if (ret)
+   return ret;
+   if (strcmp(*output, "now")) {
+   unsigned long now = approxidate("now");
+   if (approxidate(*output) >= now)
+   git_die_config(key, _("Invalid %s: '%s'"), key, 
*output);
+   }
+   return ret;
+}
+
 int git_config_get_untracked_cache(void)
 {
int val = -1;
-- 
2.12.0.206.g74921e51d6.dirty

[PATCH v5 07/22] t1700: add tests for core.splitIndex

Signed-off-by: Christian Couder 
---
 t/t1700-split-index.sh | 37 +
 1 file changed, 37 insertions(+)

diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index 5ea227e6a1..aa2aff1778 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -216,4 +216,41 @@ test_expect_success 'rev-parse --shared-index-path' '
)
 '
 
+test_expect_success 'set core.splitIndex config variable to true' '
+   git config core.splitIndex true &&
+   : >three &&
+   git update-index --add three &&
+   git ls-files --stage >ls-files.actual &&
+   cat >ls-files.expect <<-EOF &&
+   100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0   one
+   100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0   three
+   100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0   two
+   EOF
+   test_cmp ls-files.expect ls-files.actual &&
+   BASE=$(test-dump-split-index .git/index | grep "^base") &&
+   test-dump-split-index .git/index | sed "/^own/d" >actual &&
+   cat >expect <<-EOF &&
+   $BASE
+   replacements:
+   deletions:
+   EOF
+   test_cmp expect actual
+'
+
+test_expect_success 'set core.splitIndex config variable to false' '
+   git config core.splitIndex false &&
+   git update-index --force-remove three &&
+   git ls-files --stage >ls-files.actual &&
+   cat >ls-files.expect <<-EOF &&
+   100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0   one
+   100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0   two
+   EOF
+   test_cmp ls-files.expect ls-files.actual &&
+   test-dump-split-index .git/index | sed "/^own/d" >actual &&
+   cat >expect <<-EOF &&
+   not a split index
+   EOF
+   test_cmp expect actual
+'
+
 test_done
-- 
2.12.0.206.g74921e51d6.dirty

[PATCH v5 08/22] Documentation/config: add information for core.splitIndex

Signed-off-by: Christian Couder 
---
 Documentation/config.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 47603f5484..f102879261 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -334,6 +334,10 @@ core.trustctime::
crawlers and some backup systems).
See linkgit:git-update-index[1]. True by default.
 
+core.splitIndex::
+   If true, the split-index feature of the index will be used.
+   See linkgit:git-update-index[1]. False by default.
+
 core.untrackedCache::
Determines what to do about the untracked cache feature of the
index. It will be kept, if this variable is unset or set to
-- 
2.12.0.206.g74921e51d6.dirty

[PATCH v5 21/22] Documentation/config: add splitIndex.sharedIndexExpire

Signed-off-by: Christian Couder 
---
 Documentation/config.txt | 12 
 1 file changed, 12 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index b64aa7db2d..2afd5d982b 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2864,6 +2864,18 @@ splitIndex.maxPercentChange::
than 20 percent of the total number of entries.
See linkgit:git-update-index[1].
 
+splitIndex.sharedIndexExpire::
+   When the split index feature is used, shared index files that
+   were not modified since the time this variable specifies will
+   be removed when a new shared index file is created. The value
+   "now" expires all entries immediately, and "never" suppresses
+   expiration altogether.
+   The default value is "2.weeks.ago".
+   Note that a shared index file is considered modified (for the
+   purpose of expiration) each time a new split-index file is
+   created based on it.
+   See linkgit:git-update-index[1].
+
 status.relativePaths::
By default, linkgit:git-status[1] shows paths relative to the
current directory. Setting this variable to `false` shows paths
-- 
2.12.0.206.g74921e51d6.dirty

[PATCH v5 02/22] t1700: change here document style

This improves test indentation by getting rid of the outdated
here document style.

Signed-off-by: Christian Couder 
---
 t/t1700-split-index.sh | 170 -
 1 file changed, 85 insertions(+), 85 deletions(-)

diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index 6096f2c630..5ea227e6a1 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -19,12 +19,12 @@ test_expect_success 'enable split index' '
own=8299b0bcd1ac364e5f1d7768efb62fa2da79a339
base=39d890139ee5356c7ef572216cebcd27aa41f9df
fi &&
-   cat >expect one &&
git update-index --add one &&
git ls-files --stage >ls-files.actual &&
-   cat >ls-files.expect actual &&
-   cat >expect ls-files.actual &&
-   cat >ls-files.expect actual &&
-   cat >expect ls-files.actual &&
-   cat >ls-files.expect actual &&
-   cat >expect one &&
git update-index one &&
git ls-files --stage >ls-files.actual &&
-   cat >ls-files.expect actual &&
-   q_to_tab >expect two &&
git update-index --add two &&
git ls-files --stage >ls-files.actual &&
-   cat >ls-files.expect actual &&
-   q_to_tab >expect ls-files.actual &&
-   cat >ls-files.expect actual &&
-   q_to_tab >expect ls-files.actual &&
-   cat >ls-files.expect actual &&
-   cat >expect

[PATCH v5 04/22] split-index: add {add,remove}_split_index() functions

Also use the functions in cmd_update_index() in
builtin/update-index.c.

These functions will be used in a following commit to tweak
our use of the split-index feature depending on the setting
of a configuration variable.

Signed-off-by: Christian Couder 
---
 builtin/update-index.c | 18 ++
 split-index.c  | 22 ++
 split-index.h  |  2 ++
 3 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index d530e89368..24fdadfa4b 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -1099,18 +1099,12 @@ int cmd_update_index(int argc, const char **argv, const 
char *prefix)
}
 
if (split_index > 0) {
-   init_split_index(_index);
-   the_index.cache_changed |= SPLIT_INDEX_ORDERED;
-   } else if (!split_index && the_index.split_index) {
-   /*
-* can't discard_split_index(_index); because that
-* will destroy split_index->base->cache[], which may
-* be shared with the_index.cache[]. So yeah we're
-* leaking a bit here.
-*/
-   the_index.split_index = NULL;
-   the_index.cache_changed |= SOMETHING_CHANGED;
-   }
+   if (the_index.split_index)
+   the_index.cache_changed |= SPLIT_INDEX_ORDERED;
+   else
+   add_split_index(_index);
+   } else if (!split_index)
+   remove_split_index(_index);
 
switch (untracked_cache) {
case UC_UNSPECIFIED:
diff --git a/split-index.c b/split-index.c
index 615f4cac05..f519e60f87 100644
--- a/split-index.c
+++ b/split-index.c
@@ -317,3 +317,25 @@ void replace_index_entry_in_base(struct index_state 
*istate,
istate->split_index->base->cache[new->index - 1] = new;
}
 }
+
+void add_split_index(struct index_state *istate)
+{
+   if (!istate->split_index) {
+   init_split_index(istate);
+   istate->cache_changed |= SPLIT_INDEX_ORDERED;
+   }
+}
+
+void remove_split_index(struct index_state *istate)
+{
+   if (istate->split_index) {
+   /*
+* can't discard_split_index(_index); because that
+* will destroy split_index->base->cache[], which may
+* be shared with the_index.cache[]. So yeah we're
+* leaking a bit here.
+*/
+   istate->split_index = NULL;
+   istate->cache_changed |= SOMETHING_CHANGED;
+   }
+}
diff --git a/split-index.h b/split-index.h
index c1324f521a..df91c1bda8 100644
--- a/split-index.h
+++ b/split-index.h
@@ -31,5 +31,7 @@ void merge_base_index(struct index_state *istate);
 void prepare_to_write_split_index(struct index_state *istate);
 void finish_writing_split_index(struct index_state *istate);
 void discard_split_index(struct index_state *istate);
+void add_split_index(struct index_state *istate);
+void remove_split_index(struct index_state *istate);
 
 #endif
-- 
2.12.0.206.g74921e51d6.dirty

[PATCH v5 15/22] read-cache: touch shared index files when used

When a split-index file is created, let's update the mtime of the
shared index file that the split-index file is referencing.

In a following commit we will make shared index file expire
depending on their mtime, so updating the mtime makes sure that
the shared index file will not be deleted soon.

Signed-off-by: Christian Couder 
---
 read-cache.c | 29 ++---
 1 file changed, 26 insertions(+), 3 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index aeb413a508..13375fa0ff 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1674,6 +1674,19 @@ int do_read_index(struct index_state *istate, const char 
*path, int must_exist)
die("index file corrupt");
 }
 
+/*
+ * Signal that the shared index is used by updating its mtime.
+ *
+ * This way, shared index can be removed if they have not been used
+ * for some time.
+ */
+static void freshen_shared_index(char *base_sha1_hex, int warn)
+{
+   const char *shared_index = git_path("sharedindex.%s", base_sha1_hex);
+   if (!check_and_freshen_file(shared_index, 1) && warn)
+   warning("could not freshen shared index '%s'", shared_index);
+}
+
 int read_index_from(struct index_state *istate, const char *path)
 {
struct split_index *split_index;
@@ -2245,6 +2258,7 @@ static int too_many_not_shared_entries(struct index_state 
*istate)
 int write_locked_index(struct index_state *istate, struct lock_file *lock,
   unsigned flags)
 {
+   int new_shared_index, ret;
struct split_index *si = istate->split_index;
 
if (!si || alternate_index_output ||
@@ -2261,13 +2275,22 @@ int write_locked_index(struct index_state *istate, 
struct lock_file *lock,
}
if (too_many_not_shared_entries(istate))
istate->cache_changed |= SPLIT_INDEX_ORDERED;
-   if (istate->cache_changed & SPLIT_INDEX_ORDERED) {
-   int ret = write_shared_index(istate, lock, flags);
+
+   new_shared_index = istate->cache_changed & SPLIT_INDEX_ORDERED;
+
+   if (new_shared_index) {
+   ret = write_shared_index(istate, lock, flags);
if (ret)
return ret;
}
 
-   return write_split_index(istate, lock, flags);
+   ret = write_split_index(istate, lock, flags);
+
+   /* Freshen the shared index only if the split-index was written */
+   if (!ret && !new_shared_index)
+   freshen_shared_index(sha1_to_hex(si->base_sha1), 1);
+
+   return ret;
 }
 
 /*
-- 
2.12.0.206.g74921e51d6.dirty

[PATCH v5 06/22] update-index: warn in case of split-index incoherency

When users are using `git update-index --(no-)split-index`, they
may expect the split-index feature to be used or not according to
the option they just used, but this might not be the case if the
new "core.splitIndex" config variable has been set. In this case
let's warn about what will happen and why.

Signed-off-by: Christian Couder 
---
 builtin/update-index.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index 24fdadfa4b..d74d72cc7f 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -1099,12 +1099,21 @@ int cmd_update_index(int argc, const char **argv, const 
char *prefix)
}
 
if (split_index > 0) {
+   if (git_config_get_split_index() == 0)
+   warning(_("core.splitIndex is set to false; "
+ "remove or change it, if you really want to "
+ "enable split index"));
if (the_index.split_index)
the_index.cache_changed |= SPLIT_INDEX_ORDERED;
else
add_split_index(_index);
-   } else if (!split_index)
+   } else if (!split_index) {
+   if (git_config_get_split_index() == 1)
+   warning(_("core.splitIndex is set to true; "
+ "remove or change it, if you really want to "
+ "disable split index"));
remove_split_index(_index);
+   }
 
switch (untracked_cache) {
case UC_UNSPECIFIED:
-- 
2.12.0.206.g74921e51d6.dirty

[PATCH v5 18/22] t1700: test shared index file expiration

Signed-off-by: Christian Couder 
---
 t/t1700-split-index.sh | 44 
 1 file changed, 44 insertions(+)

diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index 00a64bed97..f5a95a6c28 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -326,4 +326,48 @@ test_expect_success 'check splitIndex.maxPercentChange set 
to 0' '
test_cmp expect actual
 '
 
+test_expect_success 'shared index files expire after 2 weeks by default' '
+   : >ten &&
+   git update-index --add ten &&
+   test $(ls .git/sharedindex.* | wc -l) -gt 1 &&
+   just_under_2_weeks_ago=$((5-14*86400)) &&
+   test-chmtime =$just_under_2_weeks_ago .git/sharedindex.* &&
+   : >eleven &&
+   git update-index --add eleven &&
+   test $(ls .git/sharedindex.* | wc -l) -gt 1 &&
+   just_over_2_weeks_ago=$((-1-14*86400)) &&
+   test-chmtime =$just_over_2_weeks_ago .git/sharedindex.* &&
+   : >twelve &&
+   git update-index --add twelve &&
+   test $(ls .git/sharedindex.* | wc -l) = 1
+'
+
+test_expect_success 'check splitIndex.sharedIndexExpire set to 16 days' '
+   git config splitIndex.sharedIndexExpire "16.days.ago" &&
+   test-chmtime =$just_over_2_weeks_ago .git/sharedindex.* &&
+   : >thirteen &&
+   git update-index --add thirteen &&
+   test $(ls .git/sharedindex.* | wc -l) -gt 1 &&
+   just_over_16_days_ago=$((-1-16*86400)) &&
+   test-chmtime =$just_over_16_days_ago .git/sharedindex.* &&
+   : >fourteen &&
+   git update-index --add fourteen &&
+   test $(ls .git/sharedindex.* | wc -l) = 1
+'
+
+test_expect_success 'check splitIndex.sharedIndexExpire set to "never" and 
"now"' '
+   git config splitIndex.sharedIndexExpire never &&
+   just_10_years_ago=$((-365*10*86400)) &&
+   test-chmtime =$just_10_years_ago .git/sharedindex.* &&
+   : >fifteen &&
+   git update-index --add fifteen &&
+   test $(ls .git/sharedindex.* | wc -l) -gt 1 &&
+   git config splitIndex.sharedIndexExpire now &&
+   just_1_second_ago=-1 &&
+   test-chmtime =$just_1_second_ago .git/sharedindex.* &&
+   : >sixteen &&
+   git update-index --add sixteen &&
+   test $(ls .git/sharedindex.* | wc -l) = 1
+'
+
 test_done
-- 
2.12.0.206.g74921e51d6.dirty

[PATCH v5 20/22] read-cache: use freshen_shared_index() in read_index_from()

This way a share index file will not be garbage collected if
we still read from an index it is based from.

As we need to read the current index before creating a new
one, the tests have to be adjusted, so that we don't expect
an old shared index file to be deleted right away when we
create a new one.

Signed-off-by: Christian Couder 
---
 read-cache.c   |  1 +
 t/t1700-split-index.sh | 14 +++---
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 89c95d59b3..e447751823 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1719,6 +1719,7 @@ int read_index_from(struct index_state *istate, const 
char *path)
base_sha1_hex, base_path,
sha1_to_hex(split_index->base->sha1));
 
+   freshen_shared_index(base_sha1_hex, 0);
merge_base_index(istate);
post_read_index_from(istate);
return ret;
diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index f5a95a6c28..af3ec0da5a 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -329,17 +329,17 @@ test_expect_success 'check splitIndex.maxPercentChange 
set to 0' '
 test_expect_success 'shared index files expire after 2 weeks by default' '
: >ten &&
git update-index --add ten &&
-   test $(ls .git/sharedindex.* | wc -l) -gt 1 &&
+   test $(ls .git/sharedindex.* | wc -l) -gt 2 &&
just_under_2_weeks_ago=$((5-14*86400)) &&
test-chmtime =$just_under_2_weeks_ago .git/sharedindex.* &&
: >eleven &&
git update-index --add eleven &&
-   test $(ls .git/sharedindex.* | wc -l) -gt 1 &&
+   test $(ls .git/sharedindex.* | wc -l) -gt 2 &&
just_over_2_weeks_ago=$((-1-14*86400)) &&
test-chmtime =$just_over_2_weeks_ago .git/sharedindex.* &&
: >twelve &&
git update-index --add twelve &&
-   test $(ls .git/sharedindex.* | wc -l) = 1
+   test $(ls .git/sharedindex.* | wc -l) -le 2
 '
 
 test_expect_success 'check splitIndex.sharedIndexExpire set to 16 days' '
@@ -347,12 +347,12 @@ test_expect_success 'check splitIndex.sharedIndexExpire 
set to 16 days' '
test-chmtime =$just_over_2_weeks_ago .git/sharedindex.* &&
: >thirteen &&
git update-index --add thirteen &&
-   test $(ls .git/sharedindex.* | wc -l) -gt 1 &&
+   test $(ls .git/sharedindex.* | wc -l) -gt 2 &&
just_over_16_days_ago=$((-1-16*86400)) &&
test-chmtime =$just_over_16_days_ago .git/sharedindex.* &&
: >fourteen &&
git update-index --add fourteen &&
-   test $(ls .git/sharedindex.* | wc -l) = 1
+   test $(ls .git/sharedindex.* | wc -l) -le 2
 '
 
 test_expect_success 'check splitIndex.sharedIndexExpire set to "never" and 
"now"' '
@@ -361,13 +361,13 @@ test_expect_success 'check splitIndex.sharedIndexExpire 
set to "never" and "now"
test-chmtime =$just_10_years_ago .git/sharedindex.* &&
: >fifteen &&
git update-index --add fifteen &&
-   test $(ls .git/sharedindex.* | wc -l) -gt 1 &&
+   test $(ls .git/sharedindex.* | wc -l) -gt 2 &&
git config splitIndex.sharedIndexExpire now &&
just_1_second_ago=-1 &&
test-chmtime =$just_1_second_ago .git/sharedindex.* &&
: >sixteen &&
git update-index --add sixteen &&
-   test $(ls .git/sharedindex.* | wc -l) = 1
+   test $(ls .git/sharedindex.* | wc -l) -le 2
 '
 
 test_done
-- 
2.12.0.206.g74921e51d6.dirty

[PATCH v5 19/22] read-cache: refactor read_index_from()

It looks better and is simpler to review when we don't compute
the same things many times in the function.

It will also help make the following commit simpler.

Signed-off-by: Christian Couder 
---
 read-cache.c | 14 --
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 16c05f359b..89c95d59b3 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1691,6 +1691,8 @@ int read_index_from(struct index_state *istate, const 
char *path)
 {
struct split_index *split_index;
int ret;
+   char *base_sha1_hex;
+   const char *base_path;
 
/* istate->initialized covers both .git/index and .git/sharedindex.xxx 
*/
if (istate->initialized)
@@ -1708,15 +1710,15 @@ int read_index_from(struct index_state *istate, const 
char *path)
discard_index(split_index->base);
else
split_index->base = xcalloc(1, sizeof(*split_index->base));
-   ret = do_read_index(split_index->base,
-   git_path("sharedindex.%s",
-sha1_to_hex(split_index->base_sha1)), 1);
+
+   base_sha1_hex = sha1_to_hex(split_index->base_sha1);
+   base_path = git_path("sharedindex.%s", base_sha1_hex);
+   ret = do_read_index(split_index->base, base_path, 1);
if (hashcmp(split_index->base_sha1, split_index->base->sha1))
die("broken index, expect %s in %s, got %s",
-   sha1_to_hex(split_index->base_sha1),
-   git_path("sharedindex.%s",
-sha1_to_hex(split_index->base_sha1)),
+   base_sha1_hex, base_path,
sha1_to_hex(split_index->base->sha1));
+
merge_base_index(istate);
post_read_index_from(istate);
return ret;
-- 
2.12.0.206.g74921e51d6.dirty

[PATCH v5 10/22] config: add git_config_get_max_percent_split_change()

This new function will be used in a following commit to get the
value of the "splitIndex.maxPercentChange" config variable.

Signed-off-by: Christian Couder 
---
 cache.h  |  1 +
 config.c | 15 +++
 2 files changed, 16 insertions(+)

diff --git a/cache.h b/cache.h
index 03de80daae..0bb9adcd31 100644
--- a/cache.h
+++ b/cache.h
@@ -1927,6 +1927,7 @@ extern int git_config_get_maybe_bool(const char *key, int 
*dest);
 extern int git_config_get_pathname(const char *key, const char **dest);
 extern int git_config_get_untracked_cache(void);
 extern int git_config_get_split_index(void);
+extern int git_config_get_max_percent_split_change(void);
 
 /*
  * This is a hack for test programs like test-dump-untracked-cache to
diff --git a/config.c b/config.c
index 2a97696be7..35b6f02960 100644
--- a/config.c
+++ b/config.c
@@ -1746,6 +1746,21 @@ int git_config_get_split_index(void)
return -1; /* default value */
 }
 
+int git_config_get_max_percent_split_change(void)
+{
+   int val = -1;
+
+   if (!git_config_get_int("splitindex.maxpercentchange", )) {
+   if (0 <= val && val <= 100)
+   return val;
+
+   return error(_("splitIndex.maxPercentChange value '%d' "
+  "should be between 0 and 100"), val);
+   }
+
+   return -1; /* default value */
+}
+
 NORETURN
 void git_die_config_linenr(const char *key, const char *filename, int linenr)
 {
-- 
2.12.0.206.g74921e51d6.dirty

[PATCH v5 13/22] Documentation/config: add splitIndex.maxPercentChange

Signed-off-by: Christian Couder 
---
 Documentation/config.txt | 13 +
 1 file changed, 13 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index f102879261..b64aa7db2d 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2851,6 +2851,19 @@ showbranch.default::
The default set of branches for linkgit:git-show-branch[1].
See linkgit:git-show-branch[1].
 
+splitIndex.maxPercentChange::
+   When the split index feature is used, this specifies the
+   percent of entries the split index can contain compared to the
+   total number of entries in both the split index and the shared
+   index before a new shared index is written.
+   The value should be between 0 and 100. If the value is 0 then
+   a new shared index is always written, if it is 100 a new
+   shared index is never written.
+   By default the value is 20, so a new shared index is written
+   if the number of entries in the split index would be greater
+   than 20 percent of the total number of entries.
+   See linkgit:git-update-index[1].
+
 status.relativePaths::
By default, linkgit:git-status[1] shows paths relative to the
current directory. Setting this variable to `false` shows paths
-- 
2.12.0.206.g74921e51d6.dirty

[PATCH v5 09/22] Documentation/git-update-index: talk about core.splitIndex config var

Signed-off-by: Christian Couder 
---
 Documentation/git-update-index.txt | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/git-update-index.txt 
b/Documentation/git-update-index.txt
index 7386c93162..e091b2a409 100644
--- a/Documentation/git-update-index.txt
+++ b/Documentation/git-update-index.txt
@@ -171,6 +171,12 @@ may not support it yet.
given again, all changes in $GIT_DIR/index are pushed back to
the shared index file. This mode is designed for very large
indexes that take a significant amount of time to read or write.
++
+These options take effect whatever the value of the `core.splitIndex`
+configuration variable (see linkgit:git-config[1]). But a warning is
+emitted when the change goes against the configured value, as the
+configured value will take effect next time the index is read and this
+will remove the intended effect of the option.
 
 --untracked-cache::
 --no-untracked-cache::
-- 
2.12.0.206.g74921e51d6.dirty

[PATCH v5 17/22] read-cache: unlink old sharedindex files

Everytime split index is turned on, it creates a "sharedindex."
file in the git directory. This change makes sure that shared index
files that haven't been used for a long time are removed when a new
shared index file is created.

The new "splitIndex.sharedIndexExpire" config variable is created
to tell the delay after which an unused shared index file can be
deleted. It defaults to "2.weeks.ago".

A previous commit made sure that each time a split index file is
created the mtime of the shared index file it references is updated.
This makes sure that recently used shared index file will not be
deleted.

Signed-off-by: Christian Couder 
---
 read-cache.c | 64 +++-
 1 file changed, 63 insertions(+), 1 deletion(-)

diff --git a/read-cache.c b/read-cache.c
index 13375fa0ff..16c05f359b 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2199,6 +2199,65 @@ static int write_split_index(struct index_state *istate,
return ret;
 }
 
+static const char *shared_index_expire = "2.weeks.ago";
+
+static unsigned long get_shared_index_expire_date(void)
+{
+   static unsigned long shared_index_expire_date;
+   static int shared_index_expire_date_prepared;
+
+   if (!shared_index_expire_date_prepared) {
+   git_config_get_expiry("splitindex.sharedindexexpire",
+ _index_expire);
+   shared_index_expire_date = approxidate(shared_index_expire);
+   shared_index_expire_date_prepared = 1;
+   }
+
+   return shared_index_expire_date;
+}
+
+static int should_delete_shared_index(const char *shared_index_path)
+{
+   struct stat st;
+   unsigned long expiration;
+
+   /* Check timestamp */
+   expiration = get_shared_index_expire_date();
+   if (!expiration)
+   return 0;
+   if (stat(shared_index_path, ))
+   return error_errno(_("could not stat '%s"), shared_index_path);
+   if (st.st_mtime > expiration)
+   return 0;
+
+   return 1;
+}
+
+static int clean_shared_index_files(const char *current_hex)
+{
+   struct dirent *de;
+   DIR *dir = opendir(get_git_dir());
+
+   if (!dir)
+   return error_errno(_("unable to open git dir: %s"), 
get_git_dir());
+
+   while ((de = readdir(dir)) != NULL) {
+   const char *sha1_hex;
+   const char *shared_index_path;
+   if (!skip_prefix(de->d_name, "sharedindex.", _hex))
+   continue;
+   if (!strcmp(sha1_hex, current_hex))
+   continue;
+   shared_index_path = git_path("%s", de->d_name);
+   if (should_delete_shared_index(shared_index_path) > 0 &&
+   unlink(shared_index_path))
+   warning_errno(_("unable to unlink: %s"), 
shared_index_path);
+   }
+   closedir(dir);
+
+   return 0;
+}
+
 static struct tempfile temporary_sharedindex;
 
 static int write_shared_index(struct index_state *istate,
@@ -2220,8 +2279,11 @@ static int write_shared_index(struct index_state *istate,
}
ret = rename_tempfile(_sharedindex,
  git_path("sharedindex.%s", 
sha1_to_hex(si->base->sha1)));
-   if (!ret)
+   if (!ret) {
hashcpy(si->base_sha1, si->base->sha1);
+   clean_shared_index_files(sha1_to_hex(si->base->sha1));
+   }
+
return ret;
 }
 
-- 
2.12.0.206.g74921e51d6.dirty

[PATCH v5 22/22] Documentation/git-update-index: explain splitIndex.*

Signed-off-by: Christian Couder 
---
 Documentation/config.txt   |  2 +-
 Documentation/git-update-index.txt | 37 +
 2 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 2afd5d982b..24ed9c476d 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2873,7 +2873,7 @@ splitIndex.sharedIndexExpire::
The default value is "2.weeks.ago".
Note that a shared index file is considered modified (for the
purpose of expiration) each time a new split-index file is
-   created based on it.
+   either created based on it or read from it.
See linkgit:git-update-index[1].
 
 status.relativePaths::
diff --git a/Documentation/git-update-index.txt 
b/Documentation/git-update-index.txt
index e091b2a409..1579abf3c3 100644
--- a/Documentation/git-update-index.txt
+++ b/Documentation/git-update-index.txt
@@ -163,14 +163,10 @@ may not support it yet.
 
 --split-index::
 --no-split-index::
-   Enable or disable split index mode. If enabled, the index is
-   split into two files, $GIT_DIR/index and $GIT_DIR/sharedindex..
-   Changes are accumulated in $GIT_DIR/index while the shared
-   index file contains all index entries stays unchanged. If
-   split-index mode is already enabled and `--split-index` is
-   given again, all changes in $GIT_DIR/index are pushed back to
-   the shared index file. This mode is designed for very large
-   indexes that take a significant amount of time to read or write.
+   Enable or disable split index mode. If split-index mode is
+   already enabled and `--split-index` is given again, all
+   changes in $GIT_DIR/index are pushed back to the shared index
+   file.
 +
 These options take effect whatever the value of the `core.splitIndex`
 configuration variable (see linkgit:git-config[1]). But a warning is
@@ -394,6 +390,31 @@ Although this bit looks similar to assume-unchanged bit, 
its goal is
 different from assume-unchanged bit's. Skip-worktree also takes
 precedence over assume-unchanged bit when both are set.
 
+Split index
+---
+
+This mode is designed for repositories with very large indexes, and
+aims at reducing the time it takes to repeatedly write these indexes.
+
+In this mode, the index is split into two files, $GIT_DIR/index and
+$GIT_DIR/sharedindex.. Changes are accumulated in
+$GIT_DIR/index, the split index, while the shared index file contains
+all index entries and stays unchanged.
+
+All changes in the split index are pushed back to the shared index
+file when the number of entries in the split index reaches a level
+specified by the splitIndex.maxPercentChange config variable (see
+linkgit:git-config[1]).
+
+Each time a new shared index file is created, the old shared index
+files are deleted if their modification time is older than what is
+specified by the splitIndex.sharedIndexExpire config variable (see
+linkgit:git-config[1]).
+
+To avoid deleting a shared index file that is still used, its
+modification time is updated to the current time everytime a new split
+index based on the shared index file is either created or read from.
+
 Untracked cache
 ---
 
-- 
2.12.0.206.g74921e51d6.dirty

[PATCH v5 14/22] sha1_file: make check_and_freshen_file() non static

This function will be used in a commit soon, so let's make
it available globally.

Signed-off-by: Christian Couder 
---
 cache.h | 3 +++
 sha1_file.c | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/cache.h b/cache.h
index 0bb9adcd31..a35e9d5187 100644
--- a/cache.h
+++ b/cache.h
@@ -1273,6 +1273,9 @@ extern int has_pack_index(const unsigned char *sha1);
 
 extern void assert_sha1_type(const unsigned char *sha1, enum object_type 
expect);
 
+/* Helper to check and "touch" a file */
+extern int check_and_freshen_file(const char *fn, int freshen);
+
 extern const signed char hexval_table[256];
 static inline unsigned int hexval(unsigned char c)
 {
diff --git a/sha1_file.c b/sha1_file.c
index 6628f06da3..f02f10189f 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -667,7 +667,7 @@ static int freshen_file(const char *fn)
  * either does not exist on disk, or has a stale mtime and may be subject to
  * pruning).
  */
-static int check_and_freshen_file(const char *fn, int freshen)
+int check_and_freshen_file(const char *fn, int freshen)
 {
if (access(fn, F_OK))
return 0;
-- 
2.12.0.206.g74921e51d6.dirty

[PATCH v5 11/22] read-cache: regenerate shared index if necessary