Joel Rosdahl <j...@rosdahl.net> wrote: > On 19 December 2017 at 02:16, Scott Bennett via ccache < > firstname.lastname@example.org> wrote: > Hi Joel, Sorry about the delay in responding. I've been off-line for about a week and a half and may be again shortly.
> > I set "limit_multiple = 0.95" in ccache.conf and "max_size = 30.0G" > > in ccache.conf, but cleanups are triggered when space usage reaches 24 GB, > > which is the default of 0.8. Why is this happening with ccache 3.3.4? > > > > The ccache manual is not very good at describing what actually happens at > cleanup. I'll try to improve it. > > Here's how cleanup works: After a cache miss, ccache stores the object file > in (a subdirectory of) one of the 16 top level directories in the cache > (0-9, a-f). It then checks if that top level directory holds more than > max_cache_size/16 bytes (and similar for max_files). If yes, ccache removes > files from that top level directory until it contains at most > limit_multiple*max_cache_size/16 bytes. This means that if limit_multiple The design problem is that there is no centralized index maintained of cache entries' paths, their sizes, and their timestamps, necessitating the plumbing of the directory trees. This very time-consuming task should only be required when a ccache user determines that the cache is internally inconsistent somehow, e.g., by having one or more damaged entries, having erroneous statistics, or by being out of step with the index. It should not be part of an ordinary cache eviction procedure. A command to run a consistency check/repair should not do any cache evictions based upon space, which would be done by the next actual use of ccache anyway, but rather only if the files involved are part(s) of a damaged cache entry. The overhead of maintaining the index should be minor, especially when compared to the current cleanups that can take over a half hour to run and hammer a hard drive mercilessly. (A centralized index should also include the total space in use.) The lack of a centralized index can also result in cache evictions that are not actually LRU. The kludge of using 16 caches instead of a single, unified cache would be unnecessary with a centralized index as well. The index would be used to go directly to each file to be deleted without the need for a directory tree search. Cleanups ought to be much faster. Note that some sort of short-term lock would need to be used for updating the index, too, but the same is already true for the $CCACHE_DIR/[0-9a-f]/stats files. > is 0.8, the total cache size is expected to hover around 0.9*max_cache_size > when it has filled up. But due to the pseudo-randomness of the hash Where does the hysteresis of (0.9-0.8)max_size=0.1*max_size come from? > algorithm, the cache size can be closer to 0.8*max_cache_size or > 1.0*max_cache_size. > > The above should be true for any serial usage of ccache. However, ccache is > of course very often called in parallel, and then there is a race condition > since several ccache processes that have stored an object to the same top > level directory may start the cleanup process simultaneously. Since > performing cleanup in a large cache with a low limit_multiple can take a > lot of time, more ccache processes may start to perform cleanup of the same > directory. The race can lead to the final cache size being below > limit_multiple*max_cache_size, perhaps very much so. This is a known > problem. We have had some ideas to improve the admittedly naive cleanup > logic, but nothing has been done yet. That problem, at least, seems relatively straightforward to fix. First, only one cleanup need be done in such situations, so a lock should be tested and set by the first ccache process that decides a cleanup is necessary. All later comers should be delayed until that cleanup completes, but then those others should proceed without also doing cleanups. Their decisions in favor of a cleanup are out of date once the cleanup run completes, so they should just skip any cleanups themselves or at least retest the size of what they need to store plus the current cache size against max_size to make a fresh decision. > > Maybe the above described problem is why you get a 24 GB cache size? See discussion below. > > Or maybe you ran "ccache -c"? Unlike what the manual indicates, "ccache -c" No, it was automatically triggered. > will delete files until each top level directory holds at most > limit_multiple*max_size/16... > > why is limit_multiple ignored? > > > It isn't. Or don't you see a difference if you e.g. set it to 0.5? > I haven't tried that. The caches I have represent a lot of CPU time and elapsed time, especially given that I have compression turned on, so I'm not thrilled at the idea of throwing nearly half a cache away just to try it out. What I've seen is that the cleanups are usually triggered by 0.8*max_size, and that does not change when I set limit_multiple = 0.95. 0.95*max_size is 28.5 GB, which is the threshhold at which a cleanup should stop. max_size is 30 GB, so 30 GB - size.of.entry.to.be.stored is the threshhold at which a cleanup should be triggered. Storing the triggering cache entry should be delayed until the cleanup completes in order to prevent the cache from exceeding max_size. 30 GB - 24 GB = 6 GB of compressed cache space seem a bit of a stretch to me for a single cache entry to be stored if that single entry is supposedly the trigger for starting a cleanup. This is also true for the current alleged algorithm's calculation (i.e., 28.5 GB - 24 GB = 4.5 GB). Anyway, thank you for your response. It clarified a point or two I must have missed while reading the code a couple of years ago. Scott Bennett, Comm. ASMELG, CFIAG ********************************************************************** * Internet: bennett at sdf.org *xor* bennett at freeshell.org * *--------------------------------------------------------------------* * "A well regulated and disciplined militia, is at all times a good * * objection to the introduction of that bane of all free governments * * -- a standing army." * * -- Gov. John Hancock, New York Journal, 28 January 1790 * ********************************************************************** _______________________________________________ ccache mailing list email@example.com https://lists.samba.org/mailman/listinfo/ccache