On 06/07/15 21:44, Joel Rosdahl wrote:
That sounds like a reasonable idea, but I have occasionally seen empty
object files in large and busy caches (it could be due to filesystem
failure, hardware failure or hard system reset), so I'm afraid that
using zero-length object files won't work out in practice. See also
https://bugzilla.samba.org/show_bug.cgi?id=9972. But maybe writing some
special content to the object file would be OK?
OK, fair enough, but I'd say that once you've opened the file and
checked the magic data then you've already killed performance. How about
a magic length that can be observed in the stat data?
A failure can be confirmed by a read, if and only if the length matches,
but a compile success will remain on the quick path.
A cache-hit for a compile failure need not be the *most* efficient code
path; it will likely end the build process. As long as it's faster than
the "slow" compile failures the OP cares about then all is well.
Sorry, I don't see any advantage in this scheme. You might save a
few bytes of disk space, and maybe a few inodes, but I've not seen
any evidence that those are a problem. You'll also add extra file
copies to every cache miss, and those are already expensive enough.
My primary motivation for considering the mentioned scheme is to reduce
disk seeks, not disk space. If you have a cold disk cache (on a rotating
device), every new i-node that needs to be visited potentially/likely
needs a new disk seek, which is slow. If all parts of the result are
stored in one contiguous file, it should likely be quicker to retrieve.
But as mentioned earlier, I have no data to back up this theory yet.
My understanding is that when a disk read occurs the kernel reads the
entire page into the memory cache. Subsequent inode reads will likely
hit that cache, so reading two inodes is nearly as cheep as reading one.
The system call overhead is constant, however.
A secondary motivation for the scheme is that various code paths in
ccache need to handle multiple files for a single result. There can now
be between two (stderr, object) and six (stderr, object, dependency,
coverage, diagnostics, split dwarf) files for each cached result. If one
of those files is missing, then the result should be invalid. This is
quite painful and there are most likely some lurking bugs related to this.
OK, that's quite a lot of files. Hopefully it does not look for a file
unless it really ought to be there? I worry that you'll hurt the common
case (just two files) in order to help the uncommon case, and that that
is already about as good as it can be (especially with hard-links).
A third motivation is that it would be easier to include a check sum of
the cached data to detect corruption so that ccache won't repeatedly
deliver a bad object file (due to hardware error or whatnot).
Any checksum had better be very fast. Profiling ccache already shows
that it spends more time doing MD4 than anything else.
Andrew
_______________________________________________
ccache mailing list
[email protected]
https://lists.samba.org/mailman/listinfo/ccache