On 2010-11-07 22:13, Justin Lebar wrote:
This patch is a followup to the discussion in "Questions about two hot functions in ccache".
Splendid! Thanks. The speedup factor on my machine is about 1.5.
I suspect we could use the fast_hash function for preprocessor mode without much work. I also suspect that switching to a smarter algorithm for searching for "#include" would decrease the cost of cache misses. But I haven't profiled either of these cases.
Yes, that would be interesting to investigate.
I'm a bit concerned about the fact that I had to change the reported file lengths in the manifest test (in test.sh). I'm not sure what's going on here; I may well have messed something up. Hopefully not. :)
Those sizes are not file lengths but size of the hashed content, so a change is expected since you changed the number of MD4-hashed bytes.
The improved search for __{DATE,TIME}__ is uncontroversial, so that can be applied right away. However, I would like to make the LFG-based digest opt-in, at least for now, since I think we need time to test it and to collect hash-savvy people's opinions.
By the way, can you provide some reference to why LFG (and the properties you chose) would work well as a digest for ccache's purpose? What's the expected collision rate? Or in other words: how well can we sleep at night, knowing that we haven't messed up people's builds, if we would introduce the LFG-based algorithm? :-)
My plan for ccache 3.2 is to work on configurability by introducing a config file. (I will post some thoughts on this to the list later on.) It would also be nice to work on making it possible to choose hash algorithm, say between MD4 and your LFG-based digest (and other alternatives people want to implement).
-- Joel _______________________________________________ ccache mailing list ccache@lists.samba.org https://lists.samba.org/mailman/listinfo/ccache