Re: [ccache] direct mode design bug

Andrew Stubbs Thu, 08 Nov 2012 06:04:52 -0800

On 07/11/12 19:19, Joel Rosdahl wrote:

It would be nice if ccache were only used and enabled by conscious users
who have read and understood the documentation, but in reality that
doesn't happen in many cases. For instance, Linux distributions like
Fedora install and enable ccache by default (masquerading the system
compiler), at least when installing the development environment or
similar. That's not surprising given that ccache works very well for
most people and that it is advertised as being very safe.

Hmm, I was not aware Fedora did that, but then I don't use Fedora much,and when I have Ccache is transparent enough I wouldn't necessarilynotice. :)

I am aware that Yocto uses it, by default, and certainly their userscould stumble of this problem, but again, only rarely.

    A similar issue, albeit not so interesting, perhaps, is what happens
    when a user changes some part of the toolchain, but does not alter
    the "gcc" binary. Ccache won't notice a new back-end compiler, a new
    assembler, a new linker, a new default specs file or anything like
    that. Chances are that any differences in the output are harmless,
    but the cached objects are technically invalid.


Right. However, isn't the the fact that ccache may be affected by
toolchain changes much less surprising than the fact that ccache may
fail to pick up header files correctly?


That's why it's less interesting.

    [In fact, I have a use-case in which I have multiple users sharing a
    cache, and I wanted to be able to uniquely identify the same
    toolchain across all the installations. The mtime etc. varies from
    machine to machine, as might the exact tool mix, so I have some
    local patches to do a much deeper hash of the toolchain binaries,
    and include those in the object hashes. Even then, in the interests
    of performance, those toolchain IDs are cached according to the
    location and mtime, so changing the binutils will cause temporarily
    wrong toolchain hashes. Would you be interested in such a feature
    upstream?]


Perhaps, it depends on how intrusive it is and how toolchain-specific it is.

Basically, it first does the same as CCACHE_COMPILERCHECK=mtime, anduses that to look for a <hash>.toolid file in the cache. If the tool-idis cached it reads it from that file, and uses that ID to calculate theopject hashes as usual. If the tool-id is not cached then it runs "gcc-print-prog-name=..." a few times, hashes the binaries it finds, andcaches the result for next time. CCACHE_COMPILERCHECK=content causes theID to be re-cached, and =none and =<command> are unaltered.

By this means the cached files can be shared across machines withtoolchains that really are the same (all the way to the bottom) buthappen to have different installation times being recognised as thesame, and hashed as the same, but without having to re-hash the binaryevery time.

An interesting side-effect is that binaries cached inCCACHE_COMPILERCHECK=mtime mode are now compatible with those cached inCCACHE_COMPILERCHECK=content mode, although those cached in the othermodes remain incompatible.


My implementation is currently GCC specific.

Not sure about that. I maybe overlook something, but ccache would "only"
have to follow all #include statements and note all header files that
don't exist in the include path list. (When #include is used with a
#defined token for the filename, fall back to the real compiler.) When
considering a potential cache hit, reject it if any of the header files
that didn't exist then exist now.


I was thinking of cases like:

#ifdef SOMETHING_NOT_DEFINED
#include "mystery-header.h"
#endif

Presumably you mean that it will note all the *directories* in which aparticular header file was not found, on the way to finding it?

        Anybody got other ideas?


    Running the compiler with -v prints the header search directories.
    You could use that to do your own scan.


To use the directories from "cpp -v" (plus directories from the command
line) to do some optimistic validation was my first thought as well, but
after thinking more about it I came to the conclusion that it wouldn't
buy much safety because no subdirectories will be checked, and you can't
tell which subdirectories to check unless you have parsed the #include
statements. Also, it would trigger many false negatives.

Yes, false negatives would happen, especially if there are includedirectories within the project source tree. :(

The problem is that I've not been able to think of a way that bothsolves your bug, and doesn't have a serious time-impact on either adirect-mode lookup, or a cache-miss.

As it happens, I've been thinking of ways to speed up adding things intothe cache. I've been profiling the code, and found that, on acache-miss, it spends an significant portion of it's runtime between thecompiler exiting and ccache exiting. It has occurred to me that if wewere to return the compiler's results to the user straight away, itcould then fork into the background and spend as much time as it likespopulating the cache, without slowing the build time noticeably.Compilations of the exact same source are unlikely to occur closetogether, so there's no urgent deadline for these.

Given relaxed time constraints, we could certainly do a little more workcalculating data to store in the manifest file that could then beprocessed lightening fast on a cache lookup.

So, for each include file, we need to know the list of directories itcould be found in, and which one it was actually found in. This means weneed to know what names were used in the original code (a user may havespecified an absolute path), whether they were included with <xxx.h> or"xxx.h", and what directories were in the compiler's search path, and beaware of #include_next directives.

Knowing the compiler's search path could be done with '-v' every time,or we could cache the default ones, and then "know" what thecommand-line parameters mean, or we could cache the search path for eachset of input options each time.

[Do all the supported toolchains even provide a means to learn thesearch path? If we're getting into ptrace territory then architecture/OSspecific code will be required.]

Then, at direct-mode cache-lookup time, we do exactly as now, but alsohave a list of locations where stat should return ENOENT.

    BTW, gcc has an option "--trace-includes" that might be faster than
    scanning the preprocessor output, although the compiler still has to
    do all the same work. Like this: "gcc -E hello.c -o /dev/null".


How do you use --trace-includes? It doesn't seem to be documented and
nothing happens when I try it.


Maybe it was introduced recently?

$ gcc --trace-includes -c ~/hello.c -o /dev/null
. /usr/include/stdio.h
.. /usr/include/features.h
... /usr/include/x86_64-linux-gnu/bits/predefs.h
... /usr/include/x86_64-linux-gnu/sys/cdefs.h
.... /usr/include/x86_64-linux-gnu/bits/wordsize.h
... /usr/include/x86_64-linux-gnu/gnu/stubs.h
.... /usr/include/x86_64-linux-gnu/bits/wordsize.h
.... /usr/include/x86_64-linux-gnu/gnu/stubs-64.h
.. /usr/lib/gcc/x86_64-linux-gnu/4.7/include/stddef.h
.. /usr/include/x86_64-linux-gnu/bits/types.h
... /usr/include/x86_64-linux-gnu/bits/wordsize.h
... /usr/include/x86_64-linux-gnu/bits/typesizes.h
.. /usr/include/libio.h
... /usr/include/_G_config.h
.... /usr/lib/gcc/x86_64-linux-gnu/4.7/include/stddef.h
.... /usr/include/wchar.h
... /usr/lib/gcc/x86_64-linux-gnu/4.7/include/stdarg.h
.. /usr/include/x86_64-linux-gnu/bits/stdio_lim.h
.. /usr/include/x86_64-linux-gnu/bits/sys_errlist.h
Multiple include guards may be useful for:
/usr/include/wchar.h
/usr/include/x86_64-linux-gnu/bits/predefs.h
/usr/include/x86_64-linux-gnu/bits/stdio_lim.h
/usr/include/x86_64-linux-gnu/bits/sys_errlist.h
/usr/include/x86_64-linux-gnu/bits/typesizes.h
/usr/include/x86_64-linux-gnu/gnu/stubs-64.h
/usr/include/x86_64-linux-gnu/gnu/stubs.h

$ gcc --version
gcc (Ubuntu/Linaro 4.7.2-2ubuntu1) 4.7.2
Copyright © 2012 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Andrew
_______________________________________________
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache

Re: [ccache] direct mode design bug

Reply via email to