On Thu, May 22, 2014 at 02:08:16PM -0400, David Turner wrote:

> On Thu, 2014-05-22 at 12:46 -0400, Jeff King wrote:
> > On Thu, May 22, 2014 at 12:22:43PM -0400, David Turner wrote:
> >
> > > If I have a git repository with a clean working tree, and I delete the
> > > index, then I can use git reset (with no arguments) to recreate it.
> > > However, when I do recreate it, it doesn't come back the same.  I have
> > > not analyzed this in detail, but the effect is that commands like git
> > > status take much longer because they must read objects out of a pack
> > > file.  In other words, the index seems to not realize that the index (or
> > > at least most of it) represents the same state as HEAD.  If I do git
> > > reset --hard, the index is restored to the original state (it's
> > > byte-for-byte identical), and the pack file is no longer read.
> >
> > Are you sure it's reading a packfile?
> Well, it's calling inflate(), and strace says it is reading
> e.g. .git/objects/pack/pack-....{idx,pack}.
> So, I would say so.

That seems odd that we would be spending extra time there. We do
inflate() the trees in order to diff the index against HEAD, but we
shouldn't need to inflate any blobs.

Here it is for me (on linux.git):

  [before, warm cache]
  $ time perf record -q git status >/dev/null
  real    0m0.192s
  user    0m0.080s
  sys     0m0.108s

  $ perf report | grep -v '#' | head -5
     7.46%      git  [kernel.kallsyms]   [k] __d_lookup_rcu
     4.55%      git  libz.so.1.2.8       [.] inflate
     3.53%      git  libc-2.18.so        [.] __memcmp_sse4_1
     3.46%      git  [kernel.kallsyms]   [k] security_inode_getattr
     3.29%      git  git                 [.] memihash

  $ time git reset
  real    0m0.080s
  user    0m0.036s
  sys     0m0.040s

So status is pretty quick, and the time is going to lstat in the kernel,
and some tree inflation. Reset is fast, because it has nothing much to
do. Now let's kill off the index's stat cache:

  $ rm .git/index
  $ time perf record -q git reset
  real    0m0.967s
  user    0m0.780s
  sys     0m0.180s

That took a while. What was it doing?

  $ perf report | grep -v '#' | head -5
     3.23%      git  [kernel.kallsyms]   [k] copy_user_enhanced_fast_string
     1.74%      git  libcrypto.so.1.0.0  [.] 0x000000000007e010            
     1.60%      git  [kernel.kallsyms]   [k] __d_lookup_rcu                
     1.51%      git  [kernel.kallsyms]   [k] page_fault                    
     1.44%      git  libc-2.18.so        [.] __memcmp_sse4_1               

Reading files and sha1. We hash the working-tree files here (reset
doesn't technically need to refresh the index from the working tree to
copy entries from HEAD into the index, but it does it so it can do fancy
things like tell you about which files are now out-of-date).

Now how does stat fare after this?

  $ time perf record -q git status >/dev/null
  real    0m0.189s
  user    0m0.088s
  sys     0m0.096s

Looks about the same as before to me.

Note that if you use "read-tree" instead of "reset", it _just_ loads the
index, and doesn't touch the working tree. If you then run "git status",
then _that_ command has to refresh the index, and it will pay the
hashing cost. Like:

  $ rm .git/index
  $ time git read-tree HEAD
  real    0m0.084s
  user    0m0.064s
  sys     0m0.016s
  $ time git status >/dev/null
  real    0m0.833s
  user    0m0.712s
  sys     0m0.112s

All of this is behaving as I would expect. Can you show us a set of
commands that deviate from this?

To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to