[RFC] git checkout $tree -- $path always rewrites files

Jeff King Fri, 07 Nov 2014 00:14:21 -0800

I noticed that "git checkout $tree -- $path" will _always_ unlink and
write a new copy of each matching path, even if they are up-to-date with
the index and the content in $tree is the same.


Here's a simple reproduction:

    # some repo with a file in it
    git init
    echo content >foo && git add foo && git commit -m foo
    
    # give the file a known timestamp (it doesn't matter what)
    perl -e 'utime(123, 123, @ARGV)' foo
    git update-index --refresh
    
    # now check out an identical copy
    git checkout HEAD -- foo
    
    # and check whether it was updated
    perl -le 'print ((stat)[9]) for @ARGV' foo

which yields a modern timestamp, showing that the file was rewritten.

If you use

    git checkout -- foo

instead to checkout from the index, that works fine (the final line
prints 123). Similarly, if you load the new index and checkout
separately, like:

    git read-tree -m $tree
    git checkout -- foo

that also works. Of course it is not quite the same; read-tree loads the
whole index from $tree, whereas checkout would only touch a subset of
the entries. And that's the culprit; checkout does not use unpack-trees
to only touch the entries that need updating. It uses read_tree_recursive
with a pathspec, and just replaces any index entries that match.

Below is a patch which makes it do what I expect by silently copying
flags and stat-data from the old entry, but I feel like it is going in
the wrong direction. Besides causing a few tests to fail (which I
suspect is because I implemented it at the wrong layer), it really feels
like this should be using unpack-trees or something similar.

After all, that is what "git reset $tree -- foo" does, and it gets this
case right (in fact, you can replace the read-tree above with it). It
looks like it actually does a pathspec-limited index-diff rather than
unpack-trees, but the end effect is the same (and while checkout could
just pass "update" to unpack-trees, we need to handle checking out
directly from the index anyway, so I do not think it is a big deal to
keep the index update as a separate step).

Is there a reason that we don't use this diff technique for checkout?

Anyway, here is the (wrong) patch I came up with initially.

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 29b0f6d..802e556 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -273,19 +273,13 @@ static int checkout_paths(const struct checkout_opts 
*opts,
                ce->ce_flags &= ~CE_MATCHED;
                if (!opts->ignore_skipworktree && ce_skip_worktree(ce))
                        continue;
-               if (opts->source_tree && !(ce->ce_flags & CE_UPDATE))
-                       /*
-                        * "git checkout tree-ish -- path", but this entry
-                        * is in the original index; it will not be checked
-                        * out to the working tree and it does not matter
-                        * if pathspec matched this entry.  We will not do
-                        * anything to this entry at all.
-                        */
-                       continue;
+
                /*
-                * Either this entry came from the tree-ish we are
-                * checking the paths out of, or we are checking out
-                * of the index.
+                * If we are checking out from a tree-ish, we already
+                * did our pathspec matching.  However, since we then
+                * drop the CE_UPDATE flag from any paths that do not
+                * need updating, we don't know which ones were not
+                * mentioned and which ones were simply up-to-date.
                 *
                 * If it comes from the tree-ish, we already know it
                 * matches the pathspec and could just stamp
diff --git a/read-cache.c b/read-cache.c
index 8f3e9eb..fb2240b 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -962,8 +962,13 @@ static int add_index_entry_with_check(struct index_state 
*istate, struct cache_e
 
        /* existing match? Just replace it. */
        if (pos >= 0) {
-               if (!new_only)
+               if (!new_only) {
+                       struct cache_entry *old = istate->cache[pos];
+                       if (!is_null_sha1(ce->sha1) &&
+                           !hashcmp(old->sha1, ce->sha1))
+                               copy_cache_entry(ce, old);
                        replace_index_entry(istate, pos, ce);
+               }
                return 0;
        }
        pos = -pos-1;
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC] git checkout $tree -- $path always rewrites files

Reply via email to