On Tue, Apr 12, 2005 at 07:59:55PM +0200, Minto van der Sluis wrote: > I wonder how scalable ccache it. In other words, are very large cache > trees (multiple Gb) still efficient? > > The reason I ask? I am working on T2 ( http://www.t2-project.org ). This > is a linux distribution build environment. Every package in a > distribution is build from source if possible. Currently for every > distribution being build we have a separate cache. I wonder about the > possibility to have a single cache for every distribution being build.
I do not recommend this. We're building a distribution, too. We have approx. 1000 source packages, built only for one architecture, with only one kind of optimization flags. When we used to have one huge cache pool, the whole common cache was around 10-20 GB. At that time only one machine was used to build all our packages. We didn't have any drawbacks due to the huge size. Later, when we decentralized our system, and implemented distributed build, we realized that this was the wrong way to go, and we switched to separate cache pool for all our packages. Note that we do not use distcc or similar system, a particular build of a package is done by only one host, it fetches the source and the ccache pool from the server at the beginning and puts back the result at the end. (The server distributes builds and gives jobs to the clients, based on build dependencies amongst the packages, controlled by a Makefile. One client handles one or two, rarely three (due to a race condition :-)) builds at a time. However, all this here in parentheses is irrelevant to the current topic.) There were two main reasons why we switched to per-package ccache pool: 1) Speed. Not bandwidth, rather roundtrip time. If you have ccache over nfs, and each and every ccache query goes over nfs, compiling a normal application (e.g. bash) gets even slower than without ccache at all. However, fetching bash-ccache.tar.gz at the beginning (either over nfs or with scp) and putting back the new version of this file at the end is much faster, negligible compared to the build time of a package. 2) Maintenance of the ccache pool. If you have one giant pool, it just keeps growing and growing and it's really hard to keep it clean, remove the files that will most likely be not used anymore. After an upgrade of a core component, such as gcc, glibc, sometimes ccache itself, it's quite likely that you'll get no more hits, so you can manually clear the whole cache. But if you upgrade a piece of software such as glib, then most likely glib applications will not get many hits due to changed glib headers, but other applications will. This time it's quite hard to keep track of the ccache files that are no longer needed. In our new system, the build procedure of a packages remembers the timestamp when the build started, and if the build was successful, then at the end it removes all files from its ccache pool whose access time stamp is older than the start of this build (with a little trick so that .stderr files are removed if and only if the main file is removed too). (If the build failed, however, it keeps all the files in the pool.) And then it compresses this tree (using "gzip -1" which seems to result in the best compress+upload time) and puts back to the server. -- Egmont
