On Fri, Dec 31, 2010 at 4:27 AM, Wilson Snyder <wsny...@wsnyder.org> wrote: > I also think this is a good approach, though having been > down the road before, mtime isn't always enough as you > noted, but including the size also makes it *almost* > perfect. Most edits change the size. > > Note several tools like scons use this technique, and some > store the hashes in a single hash file inside each source > directory. That has the nice advantage of allowing sharing, > though the downside of poluting the source areas so I don't > really like it. I think putting it into the ccache > infrastructure is nicer; but you may still want multiple > hashes to be stored under a hash of the directory name, > instead of a hash of the filename, because that allows > reading fewer files. (Otherwise reading the hundreds of > hash files will become the new bottleneck.)
I actually see 3 different variants being discussed in this thread: A) index based on hash of file name + attributes instead of hash of file contents B) index based on hash of file contents, but have a ccache maintain database of (file name + attributes) -> (hash of file contents) pairs C) index based on hash of file contents, and use git index for looking up (file name + attributes) -> (hash of file contents) pairs A is simplest, and would probably work well enough for system include files. Not so much for project files though, especially if we want to support CCACHE_BASEDIR (ctime/mtime probably won't match across checked out versions). B could work pretty well, I think. There is the question of where to store that new database, but it's probably doable - the database is only a cache, so it's always OK to expire entries if it grows too much. C benefits people who frequently switch their git workspace between multiple branches. When switching back to a previously compiled branch, the file mtimes will be updated, but the git index shows that the contents haven't. This type of operation is the source of many ccache hits for me (after all, the compiler wouldn't even get invoked by make if no mtimes had changed). Making C work seems complicated, as we'd need to be able to read the git index. OTOH, this also nicely solves the problem of expiring database entries: git is in charge of maintaining the index so we don't need to care about it for project files, and out-of-project files such as system headers shouldn't change nearly as often so we'd hardly ever need to expire them from the ccache database. We could even avoid any problems of concurrent database updates by just never having ccache update any (file name + attributes) -> (hash of file contents) database - git would be in charge of updating its index for in-project files, and we could have an out-of-line ccache option to do it for infrequently-modified system files... -- Michel "Walken" Lespinasse A program is never fully debugged until the last user dies. _______________________________________________ ccache mailing list ccache@lists.samba.org https://lists.samba.org/mailman/listinfo/ccache