On 07/11/12 19:19, Joel Rosdahl wrote:
It would be nice if ccache were only used and enabled by conscious users
who have read and understood the documentation, but in reality that
doesn't happen in many cases. For instance, Linux distributions like
Fedora install and enable ccache by default (masquerading the system
compiler), at least when installing the development environment or
similar. That's not surprising given that ccache works very well for
most people and that it is advertised as being very safe.

Hmm, I was not aware Fedora did that, but then I don't use Fedora much, and when I have Ccache is transparent enough I wouldn't necessarily notice. :)

I am aware that Yocto uses it, by default, and certainly their users could stumble of this problem, but again, only rarely.

    A similar issue, albeit not so interesting, perhaps, is what happens
    when a user changes some part of the toolchain, but does not alter
    the "gcc" binary. Ccache won't notice a new back-end compiler, a new
    assembler, a new linker, a new default specs file or anything like
    that. Chances are that any differences in the output are harmless,
    but the cached objects are technically invalid.


Right. However, isn't the the fact that ccache may be affected by
toolchain changes much less surprising than the fact that ccache may
fail to pick up header files correctly?

That's why it's less interesting.

    [In fact, I have a use-case in which I have multiple users sharing a
    cache, and I wanted to be able to uniquely identify the same
    toolchain across all the installations. The mtime etc. varies from
    machine to machine, as might the exact tool mix, so I have some
    local patches to do a much deeper hash of the toolchain binaries,
    and include those in the object hashes. Even then, in the interests
    of performance, those toolchain IDs are cached according to the
    location and mtime, so changing the binutils will cause temporarily
    wrong toolchain hashes. Would you be interested in such a feature
    upstream?]


Perhaps, it depends on how intrusive it is and how toolchain-specific it is.

Basically, it first does the same as CCACHE_COMPILERCHECK=mtime, and uses that to look for a <hash>.toolid file in the cache. If the tool-id is cached it reads it from that file, and uses that ID to calculate the opject hashes as usual. If the tool-id is not cached then it runs "gcc -print-prog-name=..." a few times, hashes the binaries it finds, and caches the result for next time. CCACHE_COMPILERCHECK=content causes the ID to be re-cached, and =none and =<command> are unaltered.

By this means the cached files can be shared across machines with toolchains that really are the same (all the way to the bottom) but happen to have different installation times being recognised as the same, and hashed as the same, but without having to re-hash the binary every time.

An interesting side-effect is that binaries cached in CCACHE_COMPILERCHECK=mtime mode are now compatible with those cached in CCACHE_COMPILERCHECK=content mode, although those cached in the other modes remain incompatible.

My implementation is currently GCC specific.

Not sure about that. I maybe overlook something, but ccache would "only"
have to follow all #include statements and note all header files that
don't exist in the include path list. (When #include is used with a
#defined token for the filename, fall back to the real compiler.) When
considering a potential cache hit, reject it if any of the header files
that didn't exist then exist now.

I was thinking of cases like:

#ifdef SOMETHING_NOT_DEFINED
#include "mystery-header.h"
#endif

Presumably you mean that it will note all the *directories* in which a particular header file was not found, on the way to finding it?

        Anybody got other ideas?


    Running the compiler with -v prints the header search directories.
    You could use that to do your own scan.


To use the directories from "cpp -v" (plus directories from the command
line) to do some optimistic validation was my first thought as well, but
after thinking more about it I came to the conclusion that it wouldn't
buy much safety because no subdirectories will be checked, and you can't
tell which subdirectories to check unless you have parsed the #include
statements. Also, it would trigger many false negatives.

Yes, false negatives would happen, especially if there are include directories within the project source tree. :(

The problem is that I've not been able to think of a way that both solves your bug, and doesn't have a serious time-impact on either a direct-mode lookup, or a cache-miss.

As it happens, I've been thinking of ways to speed up adding things into the cache. I've been profiling the code, and found that, on a cache-miss, it spends an significant portion of it's runtime between the compiler exiting and ccache exiting. It has occurred to me that if we were to return the compiler's results to the user straight away, it could then fork into the background and spend as much time as it likes populating the cache, without slowing the build time noticeably. Compilations of the exact same source are unlikely to occur close together, so there's no urgent deadline for these.

Given relaxed time constraints, we could certainly do a little more work calculating data to store in the manifest file that could then be processed lightening fast on a cache lookup.

So, for each include file, we need to know the list of directories it could be found in, and which one it was actually found in. This means we need to know what names were used in the original code (a user may have specified an absolute path), whether they were included with <xxx.h> or "xxx.h", and what directories were in the compiler's search path, and be aware of #include_next directives.

Knowing the compiler's search path could be done with '-v' every time, or we could cache the default ones, and then "know" what the command-line parameters mean, or we could cache the search path for each set of input options each time.

[Do all the supported toolchains even provide a means to learn the search path? If we're getting into ptrace territory then architecture/OS specific code will be required.]

Then, at direct-mode cache-lookup time, we do exactly as now, but also have a list of locations where stat should return ENOENT.

    BTW, gcc has an option "--trace-includes" that might be faster than
    scanning the preprocessor output, although the compiler still has to
    do all the same work. Like this: "gcc -E hello.c -o /dev/null".


How do you use --trace-includes? It doesn't seem to be documented and
nothing happens when I try it.

Maybe it was introduced recently?

$ gcc --trace-includes -c ~/hello.c -o /dev/null
. /usr/include/stdio.h
.. /usr/include/features.h
... /usr/include/x86_64-linux-gnu/bits/predefs.h
... /usr/include/x86_64-linux-gnu/sys/cdefs.h
.... /usr/include/x86_64-linux-gnu/bits/wordsize.h
... /usr/include/x86_64-linux-gnu/gnu/stubs.h
.... /usr/include/x86_64-linux-gnu/bits/wordsize.h
.... /usr/include/x86_64-linux-gnu/gnu/stubs-64.h
.. /usr/lib/gcc/x86_64-linux-gnu/4.7/include/stddef.h
.. /usr/include/x86_64-linux-gnu/bits/types.h
... /usr/include/x86_64-linux-gnu/bits/wordsize.h
... /usr/include/x86_64-linux-gnu/bits/typesizes.h
.. /usr/include/libio.h
... /usr/include/_G_config.h
.... /usr/lib/gcc/x86_64-linux-gnu/4.7/include/stddef.h
.... /usr/include/wchar.h
... /usr/lib/gcc/x86_64-linux-gnu/4.7/include/stdarg.h
.. /usr/include/x86_64-linux-gnu/bits/stdio_lim.h
.. /usr/include/x86_64-linux-gnu/bits/sys_errlist.h
Multiple include guards may be useful for:
/usr/include/wchar.h
/usr/include/x86_64-linux-gnu/bits/predefs.h
/usr/include/x86_64-linux-gnu/bits/stdio_lim.h
/usr/include/x86_64-linux-gnu/bits/sys_errlist.h
/usr/include/x86_64-linux-gnu/bits/typesizes.h
/usr/include/x86_64-linux-gnu/gnu/stubs-64.h
/usr/include/x86_64-linux-gnu/gnu/stubs.h

$ gcc --version
gcc (Ubuntu/Linaro 4.7.2-2ubuntu1) 4.7.2
Copyright © 2012 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Andrew
_______________________________________________
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache

Reply via email to