Duy Nguyen <pclo...@gmail.com> writes:
> I wonder if we need to update_main_cache_tree() so many times in this
> function. If I read the code correctly, all roads must lead to
> update_main_cache_tree(0) in prepare_to_commit().
I think prepare-to-commit is too late; it does not want to know if
the index it was given to create a tree out of is the one that the
user will keep using after this invocation of "git commit" or just a
temporary one used for partial commit. The cache-tree rebuilt there
is what is recorded with commit_tree_extended() in cmd_commit(), but
if you are doing a partial commit, these generic code paths are
given a temporary index file on the filesystem to work on, and
cache-tree in that will not help user's later operation.
For three main uses of "git commit", prepare_index() does these:
- "git commit -a" and "git commit -i $paths" update the index with
the new contents from the working tree, fully builds cache-tree
in-core to write out the tree objects, and writes the index file
to the filesystem. Because this index is the one used after this
invocation of "git commit" returns, we have a fully populated
cache-tree after this happens. This code path is perfect and
there is no need to change.
- "git commit" commits the contents of the index as-is, so
technically there is no reason for it to update the index on the
filesystem at all, but it refreshes the cached stat data to help
the "status" part of the command, and if it finds that stat data
was stale, updates the index on the filesystem because it is
wasteful not to do so. As we would be spending I/O cycles to
update the index file in that case anyway, we also rebuild the
cache-tree and include that in the updated index.
When the cached stat data was already up-to-date, however, we do
not update the index on the filesystem, so the series under
discussion will change the trade-off by doing one more I/O to
write out a new index to the filesystem only to update cache-tree.
- "git commit $paths" updates the "real" index with given $paths
and writes it out to the filesystem first. This is the index the
user will use after "git commit" finishes; traditionally our
trade-off was "populate cache-tree only when we do not have to
spend any cycle only to do so (i.e. when we are writing trees
anyway, or when we read from a tree), and let it degrade as paths
are added, removed and modified" and we avoided populating
cache-tree in this codepath. The series under discussion will
change the trade-off here, too.
After it updates this "real" index, it builds another temporary
index that represents the tree state to be committed (starting
from HEAD and updates with the given $paths), but that will be
discarded and we do not want to spend any extra cycle to do
anything only to make its later use more efficient (like writing
updated cache-tree to it).
> If we find out that
> we have incomplete cache-tree before that call, we could write the
> index one more time after that call,
and that will make an extra I/O only to write out cache-tree to the
temporary index that we will discard immediately after for a partial
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html