Refreshing index timestamps without reading content

Quentin Casasnovas Thu, 05 Jan 2017 03:19:54 -0800

Hi guys,

Apologies if this is documented somewhere, I have fairly bad search vudu
skills.


I'm looking for a way to cause a full refresh of the index without causing
any read of the files, basically telling git "trust me, all worktree files
are matching the index, but their stat information have changed".  I have
read about the update-index --assume-unchanged and --skip-worktree flags in
the documentation, but these do not cause any index refresh - rather, they
fake that the respective worktree files are matching the index until you
remove those assume-unchanged/skip-worktree bits.

This might sound like a really weird thing to do, but I do have a use case
for it - we have some build farm setup where the resulting objects of a
compilation are stored on a shared server.  The source files are not stored
on the shared server, but locally on each of the build server (as to
decrease network load and make good use of local storage as caches).

We then use an onion filesystem to mount the compiled objects on top of the
local sources - and change the modification time of the source to be older
than the object files, so that on subsequent builds, make does not rebuild
the whole world.

This works fine except for one thing, after changing the mtime of the
source files, the first subsequent git command needing to compare the tree
with the index will take a LONG time since it will read all of the object
content:

  cd linux-2.6

  # Less than a second  when the index is up to date
  time git status > /dev/null
  git status 0.06s user 0.09s system 172% cpu 0.087 total
                                              ~~~~~~~~~~~

  # Change the mtime..
  git ls-tree -r --name-only HEAD | xargs -n 1024 touch

  # Now 30s..
  time git status > /dev/null
  git status  2.73s user 1.79s system 13% cpu 32.453 total
                                              ~~~~~~~~~~~~

The timing information above was captured on my laptop SSD and the penalty
is obviously much higher on spinning disks - especially when this operation
is done on *hundreds* of different work tree in parallel, all hosted on the
same filesystem (it can take tens of minutes!).

Is there any way to tell git, after the git ls-tree command above, to
refresh its stat cache information and trust us that the file content has
not changed, as to avoid any useless file read (though it will obviously
will have to stat all of them, but that's not something we can really
avoid)

If not, I am willing to implement a --assume-content-unchanged to the git
update-index if you guys don't see something fundamentally wrong with this
approach.

Thanks for any hints you can give! :)

Q

signature.asc
Description: Digital signature

Refreshing index timestamps without reading content

Reply via email to