Hi, all. I ran ccache through |perf| on my x64 Linux box today. In my testcase (|make clean && perf record -g make| within a subdirectory of the Firefox tree), there are only four functions that see more than 2% of the samples:
25.39% c++ ccache [.] hash_source_code_string 10.15% c++ ccache [.] mdfour64 4.04% c++ [kernel.kallsyms] [k] copy_user_generic_string 3.14% c++ ccache [.] mdfour_update So it appears that 13% of my CPU time is spent computing md4 hashes, while another 25% is spent in hash_source_code_string but outside the MD4 code. To someone new to the code like me, it appears that there's some room for optimization here. * hash_source_code_string is doing twice as much work as anything else in ccache, but only to catch edge cases (comments and special macros). If it could be simplified, the speed gains might offset the cost of additional false positives. If all we really care about is finding the strings "__DATE__" and "__TIME__", there are faster algorithms than a character-by-character search. (Note also that the current implementation copies the whole file into hashbuf one character at a time. Again, do the benefits of stripping out comments really offset this?) * Why does ccache still use MD4? Surely there's a better / faster hash out there. I noticed that ccache includes murmurhash, but it doesn't seem like it's used in too many places. There's probably a good reason for this, but it's not too apparent to me. You all probably know better than I if ccache should use a secure hash function, or if something like murmurhash is sufficient -- a secure hash function seems like overkill to me, fwiw. But either way, is MD4, which on the one hand is no longer a secure hash function, and which on the other hand I'd imagine is nowhere near as fast as something like murmurhash, the right function to use? I'm curious what you all think about this. Regards, -Justin _______________________________________________ ccache mailing list ccache@lists.samba.org https://lists.samba.org/mailman/listinfo/ccache