Re: [ccache] Caching failed compilations

Andrew Stubbs Tue, 07 Jul 2015 02:00:08 -0700

On 06/07/15 21:44, Joel Rosdahl wrote:

That sounds like a reasonable idea, but I have occasionally seen empty
object files in large and busy caches (it could be due to filesystem
failure, hardware failure or hard system reset), so I'm afraid that
using zero-length object files won't work out in practice. See also
https://bugzilla.samba.org/show_bug.cgi?id=9972. But maybe writing some
special content to the object file would be OK?

OK, fair enough, but I'd say that once you've opened the file andchecked the magic data then you've already killed performance. How abouta magic length that can be observed in the stat data?

A failure can be confirmed by a read, if and only if the length matches,but a compile success will remain on the quick path.

A cache-hit for a compile failure need not be the *most* efficient codepath; it will likely end the build process. As long as it's faster thanthe "slow" compile failures the OP cares about then all is well.

    Sorry, I don't see any advantage in this scheme. You might save a
    few bytes of disk space, and maybe a few inodes, but I've not seen
    any evidence that those are a problem. You'll also add extra file
    copies to every cache miss, and those are already expensive enough.


My primary motivation for considering the mentioned scheme is to reduce
disk seeks, not disk space. If you have a cold disk cache (on a rotating
device), every new i-node that needs to be visited potentially/likely
needs a new disk seek, which is slow. If all parts of the result are
stored in one contiguous file, it should likely be quicker to retrieve.
But as mentioned earlier, I have no data to back up this theory yet.

My understanding is that when a disk read occurs the kernel reads theentire page into the memory cache. Subsequent inode reads will likelyhit that cache, so reading two inodes is nearly as cheep as reading one.The system call overhead is constant, however.

A secondary motivation for the scheme is that various code paths in
ccache need to handle multiple files for a single result. There can now
be between two (stderr, object) and six (stderr, object, dependency,
coverage, diagnostics, split dwarf) files for each cached result. If one
of those files is missing, then the result should be invalid. This is
quite painful and there are most likely some lurking bugs related to this.

OK, that's quite a lot of files. Hopefully it does not look for a fileunless it really ought to be there? I worry that you'll hurt the commoncase (just two files) in order to help the uncommon case, and that thatis already about as good as it can be (especially with hard-links).

A third motivation is that it would be easier to include a check sum of
the cached data to detect corruption so that ccache won't repeatedly
deliver a bad object file (due to hardware error or whatnot).

Any checksum had better be very fast. Profiling ccache already showsthat it spends more time doing MD4 than anything else.


Andrew
_______________________________________________
ccache mailing list
[email protected]
https://lists.samba.org/mailman/listinfo/ccache

Re: [ccache] Caching failed compilations

Reply via email to