On Wed, 2014-05-14 at 17:36 +0700, Duy Nguyen wrote:
> >>  With that in
> >> mind, I think you don't need to keep a fs cache on disk at all. All
> >> you need to store (in the index) is the clock value from watchman.
> >> After we parse the index, we perform a "since" query to get path names
> >> (and perhaps "exists" and "mode" for later). Then we set CE_VALID bit
> >> on entries that are _not_ in the query result. The remaining items
> >> will be lstat'd by git (see [1] and read-cache.c changes on the next
> >> few patches). Assuming the number of updated files is reasonably
> >> small, we won't  be punished by lookup time.
> >
> > I considered this, but rejected it for a few reasons:
> > 1. We still need to know about the untracked files.  I guess we could
> > do an index extension for just that, like your untracked cache.
> Yes. But consider that the number of untracked files is usually small
> (otherwise 'git status' would look very messy). And your fscache would
> need to store excluded file list too, which could be a lot bigger (one
> pair of  .[ch] -> one .o). _But_ yours would make 'git status
> --ignored' work. I don't consider that a major use case for
> optimization, but people may have different opinions.

I don't think --ignored is a major use case.  But I do think it's nice
to get as much mileage as possible out of a change like this.  

> > 2. That doesn't eliminate opendir/readdir, I think.  Those are a major
> > cost. I did not thoroughly review your patches from January/February,
> > but I didn't notice anything in there about this part of dir.c.
> > Also the cost of open(nonexistent .gitignore) to do ignore processing.
> Assuming untracked cache is in use, opendir/readdir is replaced with
> lstat. And cheap lstat can be solved with watchman without storing
> anything extra. I solve open(.gitignore) too by checking its stat data
> with the one in index. If matches, I reuse the version in tree. This
> does not necessarily make it cheaper, but it increases locality so it
> might be. _Modified_ .gitignore files have to go through
> open(.gitignore), but people don't update .gitignore often.

Interesting -- if all that actually works, then it's probably the right

> > 3. Changing a file in the index (e.g. git add) requires clearing
> > the CE_VALID bit; this means that third-party libraries (e.g. jgit)
> > that change the git repo need to understand this extension for correct
> > operation.  Maybe that's just the nature of extensions, but it's not
> > something that my present patch set requires.
> I don't store CE_VALID on disk. Instead I store a new bit CE_WATCHED.
> Only after loading and validating against watchman that I turn
> CE_WATCHED to CE_VALID in memory. So JGit, libgit2 are not confused.
> I assume you won't change your mind about this. Which is fine to me. I
> will still try out my approach with your libwatchman though. Just
> curious about its performance and complexity, compared to your
> approach.

I am open-minded here. This code is really the first time I have looked
at git's internals, and I accept that your way might be better.  If
you're going to try the watchman version of your approach, then we do a
direct comparison.  Let me know if there is something I can do to help
out on that.

> A bit off topic, but msys fork has another "fscache" in compat/win32.
> If you could come up with a different name, maybe it'll be less
> confusing for them after merging. But this is not a big deal, as this
> fscache won't work on windows anyway.

Does wtcache sounds like a better name?

To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to