I don't want to rain on peoples' parade here, because ccache is a great product that has real benefits, but I do want to share some of our findings regarding the use of ccache in our very large product -- we were surprised by them, and you may be as well. These findings are specifically for *large products*. In our case, the total source code file size is on the order of 3 gigabytes (which includes not only C/C++ but also Java source files, a couple hundred thousand lines of makefiles, etc). It's the Android mobile phone OS, fwiw: it builds something like 1-2 gigabytes of .o files from C/C++ during a full build, and does a ton of Java compilation, resource compilation, Dalvik compilation, etc as well.
Very short version: if your 'make' dependencies or equivalent are well-written, using ccache will almost always *increase* your incremental build times. This wasn't immediately obvious to us but makes sense in hindsight: if your dependencies are well constructed, then when you run 'make' it won't try to build something unless it really has changed. We see *very* few ccache hits over time when doing incremental builds. Slightly longer version: if your 'make' dependencies are well-written, then using ccache will almost always increase your incremental build times. Even if your dependencies are slightly inefficient (i.e. you're getting some unnecessary compilation on a regular basis, but not tons), ccache may well still be slowing you down overall on incremental builds unless your computers have lots of RAM relative to the size of your project. It turns out that fitting the build inputs *and* outputs into the VM/filesystem buffer cache usually provides much more build-time benefit than ccache. (Unless your project is very large, you're almost certainly fitting everything in ram as a matter of course and so ccache is a fine idea.) If you're regularly doing something equivalent to a 'clean' build from make's perspective, but with a hot ccache, then ccache is solving exactly your problem and you definitely want to use it. Long answer; only applicable to very large projects: The issue is around VM/file system buffer cache management. If you're using ccache, then you'll effectively be doubling the number of .o files that are paged into memory during the course of a build. If the extra VM paging winds up pushing source code out of the VM buffer cache, then that is going to be a significant hit when your build system actually needs to build that source file -- it'll have to go to the disk for it rather that just reading it out of memory. Ideally your build machines will have enough RAM to hold in the buffer cache the entire source base, plus the entire set of .o / .class / etc intermediate build output files, *plus* the entire ccache contents for the project. That's on top of the actual memory usage of the compilers etc that run during the course of the build. As soon as things are being kicked out of the buffer cache during the course of a build, you'll take a speed hit that will more than eradicate the benefits due to using ccache. Our product may be a bit pathological in this way: a lot of the RAM-hungry tools and source files are not C/C++, which means a heavy demand on RAM in ways that ccache can't help with. The ordering of RAM in a typical build system makes things worse, in fact; the basic C/C++ compilation units during which ccache pushes source files and .o contents through the buffer cache are typically earlier during compilation, then later on in the build things like linkers run, which are RAM hungry themselves and wind up pushing the sources out of the buffer cache on memory-constrained machines. Then when you want to run an incremental build the sources have to be paged right back in from disk, and there goes your time benefit. Android's ccache footprint is one or two gigabytes(!) of .o files. We've found that on computers with less than around 20-24 gigs of RAM (!!), ccache tends to increase build times -- 12 gigs isn't enough to hold everything in the buffer cache; 24 is. Once everything is in the buffer cache, ccache's ability to avoid running gcc is a win, especially for clean builds with a hot cache. -- chris tate android framework engineer On Wed, Nov 10, 2010 at 2:54 PM, Paul Smith <p...@mad-scientist.net> wrote: > Hi all; I've been considering for a long time enabling ccache support in > our build environment. However, our environment has a few unusual > aspects and I wondered if anyone had any thoughts about steps we might > take to ensure things are working properly. The documentation I've > found is just not _quite_ precise enough about exactly what is added to > the hash. > > Very briefly we have multiple "workspaces" per user, mostly stored on > their own systems. These workspaces are typically pointing to different > lines of development, and in those some files are the same and some are > different (pretty basic). What I'd like to do is have one ccache per > user per host, so that all the workspaces for a given user on a given > host share the cache (rather than, for example, one cache per workspace > or sharing caches between users and/or hosts--that could come later). > Again, pretty straightforward. > > The first interesting bit is that in our build environment we have a set > of (multiple different) cross-compilers, along with completely > encapsulated environments (usr/include, usr/lib, etc.) for different > targets. These compilers and environments are packed up into tarballs > which are kept in our source tree, and unpacked by our build system when > our build starts. We do not use the native compiler at all. > > The second interesting bit is that the actual file that is invoked is > not the actual compiler, but a symlink to a shell script wrapper that > invokes the real compiler with a set of extra command-line arguments. > So we invoke a command like "i686-rhel4-linux-glibc-g++", which is a > symlink to a generic shell script like "toolchain-wrapper.sh", which > unpacks the symlink name ("i686-rhel4-linux-glibc-g++") to determine > that we want to run the g++ compiler to generate 32bit code compiled > against a Red Hat EL 4/GNU libc environment, then invokes a real > compiler with the right options to make that happen. A different > command (say "x86_64-rhel5-linux-glibc-gcc") is a symlink to the same > "toolchain-wrapper.sh" file, but you get very different results. > > The final interesting thing is that when we unpack these compiler > tarballs we use the -m option so that all the files we unpack have their > times set to "now", rather than the times they had when they were packed > up. Thinking about this I believe we could remove this in this case, so > the timestamps would be preserved, if that would be useful. > > > So, a few things: first the default mtime/size to determine if compilers > have changed won't work well for us. Every time I do a clean build and > my compilers are unpacked again, the timestamp on them will change (due > to tar -m), so I won't get any cache hits (right?) > > If I remove the -m so that the timestamps in the tarball are preserved, > then the timestamps will always be identical, unless I load up a new > compiler build. So that's actually nice. > > What about the script wrapper? Loading a new compiler will change the > timestamp (at least) on the script wrapper as well but here I worry > about incorrect duplication in the same build. For example suppose I > build a file into two objects like this: > > i686-rhel4-linux-glibc-g++ -o 32bit/foo.o -c foo.c > x86_64_rhel4-linux-glibc-g++ -o 64bit/foo.o -c foo.c > > Now both of these are symlinks to the same wrapper script so ccache will > cache the same mtime/size for both compilers. Also, they have the same > flags at this level. Underneath, of course, the wrapper script will > invoke completely different compilers with different flags but that's > not visible to ccache. Suppose the preprocessor output was the same in > both cases so that's not an issue: only the compiler generated 32bit .o > for the first and a 64bit .o for the second. > > So, my question is, is the NAME of the compiler part of the hash as well > as the mtime/size, so that ccache won't consider the second one to be a > copy of the first? > > Of course I can always resort to CCACHE_COMPILERCHECK='%compiler% -v' > which will tell me what I want to know (that these compilers are very > different). But it would be nice to avoid that overhead if possible. > > Also if I DO go with CCACHE_COMPILERCHECK, is ONLY that output part of > the hash? Or is the mtime/size for the compiler also part of the hash? > > It would be nice for debugging/etc. if there was a way to see exactly > what elements are going into the hash for a given target. > > > Sorry for the long email; thanks for any pointers or tips! > > _______________________________________________ > ccache mailing list > email@example.com > https://lists.samba.org/mailman/listinfo/ccache > _______________________________________________ ccache mailing list firstname.lastname@example.org https://lists.samba.org/mailman/listinfo/ccache