On 07/11/12 19:19, Joel Rosdahl wrote:
It would be nice if ccache were only used and enabled by conscious users
who have read and understood the documentation, but in reality that
doesn't happen in many cases. For instance, Linux distributions like
Fedora install and enable ccache by default (masquerading the system
compiler), at least when installing the development environment or
similar. That's not surprising given that ccache works very well for
most people and that it is advertised as being very safe.
Hmm, I was not aware Fedora did that, but then I don't use Fedora much,
and when I have Ccache is transparent enough I wouldn't necessarily
notice. :)
I am aware that Yocto uses it, by default, and certainly their users
could stumble of this problem, but again, only rarely.
A similar issue, albeit not so interesting, perhaps, is what happens
when a user changes some part of the toolchain, but does not alter
the "gcc" binary. Ccache won't notice a new back-end compiler, a new
assembler, a new linker, a new default specs file or anything like
that. Chances are that any differences in the output are harmless,
but the cached objects are technically invalid.
Right. However, isn't the the fact that ccache may be affected by
toolchain changes much less surprising than the fact that ccache may
fail to pick up header files correctly?
That's why it's less interesting.
[In fact, I have a use-case in which I have multiple users sharing a
cache, and I wanted to be able to uniquely identify the same
toolchain across all the installations. The mtime etc. varies from
machine to machine, as might the exact tool mix, so I have some
local patches to do a much deeper hash of the toolchain binaries,
and include those in the object hashes. Even then, in the interests
of performance, those toolchain IDs are cached according to the
location and mtime, so changing the binutils will cause temporarily
wrong toolchain hashes. Would you be interested in such a feature
upstream?]
Perhaps, it depends on how intrusive it is and how toolchain-specific it is.
Basically, it first does the same as CCACHE_COMPILERCHECK=mtime, and
uses that to look for a <hash>.toolid file in the cache. If the tool-id
is cached it reads it from that file, and uses that ID to calculate the
opject hashes as usual. If the tool-id is not cached then it runs "gcc
-print-prog-name=..." a few times, hashes the binaries it finds, and
caches the result for next time. CCACHE_COMPILERCHECK=content causes the
ID to be re-cached, and =none and =<command> are unaltered.
By this means the cached files can be shared across machines with
toolchains that really are the same (all the way to the bottom) but
happen to have different installation times being recognised as the
same, and hashed as the same, but without having to re-hash the binary
every time.
An interesting side-effect is that binaries cached in
CCACHE_COMPILERCHECK=mtime mode are now compatible with those cached in
CCACHE_COMPILERCHECK=content mode, although those cached in the other
modes remain incompatible.
My implementation is currently GCC specific.
Not sure about that. I maybe overlook something, but ccache would "only"
have to follow all #include statements and note all header files that
don't exist in the include path list. (When #include is used with a
#defined token for the filename, fall back to the real compiler.) When
considering a potential cache hit, reject it if any of the header files
that didn't exist then exist now.
I was thinking of cases like:
#ifdef SOMETHING_NOT_DEFINED
#include "mystery-header.h"
#endif
Presumably you mean that it will note all the *directories* in which a
particular header file was not found, on the way to finding it?
Anybody got other ideas?
Running the compiler with -v prints the header search directories.
You could use that to do your own scan.
To use the directories from "cpp -v" (plus directories from the command
line) to do some optimistic validation was my first thought as well, but
after thinking more about it I came to the conclusion that it wouldn't
buy much safety because no subdirectories will be checked, and you can't
tell which subdirectories to check unless you have parsed the #include
statements. Also, it would trigger many false negatives.
Yes, false negatives would happen, especially if there are include
directories within the project source tree. :(
The problem is that I've not been able to think of a way that both
solves your bug, and doesn't have a serious time-impact on either a
direct-mode lookup, or a cache-miss.
As it happens, I've been thinking of ways to speed up adding things into
the cache. I've been profiling the code, and found that, on a
cache-miss, it spends an significant portion of it's runtime between the
compiler exiting and ccache exiting. It has occurred to me that if we
were to return the compiler's results to the user straight away, it
could then fork into the background and spend as much time as it likes
populating the cache, without slowing the build time noticeably.
Compilations of the exact same source are unlikely to occur close
together, so there's no urgent deadline for these.
Given relaxed time constraints, we could certainly do a little more work
calculating data to store in the manifest file that could then be
processed lightening fast on a cache lookup.
So, for each include file, we need to know the list of directories it
could be found in, and which one it was actually found in. This means we
need to know what names were used in the original code (a user may have
specified an absolute path), whether they were included with <xxx.h> or
"xxx.h", and what directories were in the compiler's search path, and be
aware of #include_next directives.
Knowing the compiler's search path could be done with '-v' every time,
or we could cache the default ones, and then "know" what the
command-line parameters mean, or we could cache the search path for each
set of input options each time.
[Do all the supported toolchains even provide a means to learn the
search path? If we're getting into ptrace territory then architecture/OS
specific code will be required.]
Then, at direct-mode cache-lookup time, we do exactly as now, but also
have a list of locations where stat should return ENOENT.
BTW, gcc has an option "--trace-includes" that might be faster than
scanning the preprocessor output, although the compiler still has to
do all the same work. Like this: "gcc -E hello.c -o /dev/null".
How do you use --trace-includes? It doesn't seem to be documented and
nothing happens when I try it.
Maybe it was introduced recently?
$ gcc --trace-includes -c ~/hello.c -o /dev/null
. /usr/include/stdio.h
.. /usr/include/features.h
... /usr/include/x86_64-linux-gnu/bits/predefs.h
... /usr/include/x86_64-linux-gnu/sys/cdefs.h
.... /usr/include/x86_64-linux-gnu/bits/wordsize.h
... /usr/include/x86_64-linux-gnu/gnu/stubs.h
.... /usr/include/x86_64-linux-gnu/bits/wordsize.h
.... /usr/include/x86_64-linux-gnu/gnu/stubs-64.h
.. /usr/lib/gcc/x86_64-linux-gnu/4.7/include/stddef.h
.. /usr/include/x86_64-linux-gnu/bits/types.h
... /usr/include/x86_64-linux-gnu/bits/wordsize.h
... /usr/include/x86_64-linux-gnu/bits/typesizes.h
.. /usr/include/libio.h
... /usr/include/_G_config.h
.... /usr/lib/gcc/x86_64-linux-gnu/4.7/include/stddef.h
.... /usr/include/wchar.h
... /usr/lib/gcc/x86_64-linux-gnu/4.7/include/stdarg.h
.. /usr/include/x86_64-linux-gnu/bits/stdio_lim.h
.. /usr/include/x86_64-linux-gnu/bits/sys_errlist.h
Multiple include guards may be useful for:
/usr/include/wchar.h
/usr/include/x86_64-linux-gnu/bits/predefs.h
/usr/include/x86_64-linux-gnu/bits/stdio_lim.h
/usr/include/x86_64-linux-gnu/bits/sys_errlist.h
/usr/include/x86_64-linux-gnu/bits/typesizes.h
/usr/include/x86_64-linux-gnu/gnu/stubs-64.h
/usr/include/x86_64-linux-gnu/gnu/stubs.h
$ gcc --version
gcc (Ubuntu/Linaro 4.7.2-2ubuntu1) 4.7.2
Copyright © 2012 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Andrew
_______________________________________________
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache