[ccache] effect upon ccache of changes to cache_dir_levels

2019-11-11 Thread Scott Bennett via ccache
 What does ccache do to an existing cache tree if one increases
or decreases the value of cache_dir_levels in ccache.conf?  Are cache
entries moved?  Are directories added or deleted to accommodate the
change?

  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
**
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**

___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] why is limit_multiple ignored?

2018-01-07 Thread Scott Bennett via ccache
 Joel Rosdahl <j...@rosdahl.net> wrote:

> On 19 December 2017 at 02:16, Scott Bennett via ccache <
> ccache@lists.samba.org> wrote:
>
Hi Joel,
 Sorry about the delay in responding.  I've been off-line for about a
week and a half and may be again shortly.

> >  I set "limit_multiple = 0.95" in ccache.conf and "max_size = 30.0G"
> > in ccache.conf, but cleanups are triggered when space usage reaches 24 GB,
> > which is the default of 0.8.  Why is this happening with ccache 3.3.4?
> >
>
> The ccache manual is not very good at describing what actually happens at
> cleanup. I'll try to improve it.
>
> Here's how cleanup works: After a cache miss, ccache stores the object file
> in (a subdirectory of) one of the 16 top level directories in the cache
> (0-9, a-f). It then checks if that top level directory holds more than
> max_cache_size/16 bytes (and similar for max_files). If yes, ccache removes
> files from that top level directory until it contains at most
> limit_multiple*max_cache_size/16 bytes. This means that if limit_multiple

 The design problem is that there is no centralized index maintained of
cache entries' paths, their sizes, and their timestamps, necessitating the
plumbing of the directory trees.  This very time-consuming task should only
be required when a ccache user determines that the cache is internally
inconsistent somehow, e.g., by having one or more damaged entries, having
erroneous statistics, or by being out of step with the index.  It should not
be part of an ordinary cache eviction procedure.  A command to run a
consistency check/repair should not do any cache evictions based upon space,
which would be done by the next actual use of ccache anyway, but rather only
if the files involved are part(s) of a damaged cache entry.  The overhead of
maintaining the index should be minor, especially when compared to the
current cleanups that can take over a half hour to run and hammer a hard
drive mercilessly.  (A centralized index should also include the total space
in use.)  The lack of a centralized index can also result in cache evictions
that are not actually LRU.  The kludge of using 16 caches instead of a
single, unified cache would be unnecessary with a centralized index as well.
The index would be used to go directly to each file to be deleted without
the need for a directory tree search.  Cleanups ought to be much faster.
Note that some sort of short-term lock would need to be used for updating
the index, too, but the same is already true for the
$CCACHE_DIR/[0-9a-f]/stats files.

> is 0.8, the total cache size is expected to hover around 0.9*max_cache_size
> when it has filled up. But due to the pseudo-randomness of the hash

 Where does the hysteresis of (0.9-0.8)max_size=0.1*max_size come from?

> algorithm, the cache size can be closer to 0.8*max_cache_size or
> 1.0*max_cache_size.
>
> The above should be true for any serial usage of ccache. However, ccache is
> of course very often called in parallel, and then there is a race condition
> since several ccache processes that have stored an object to the same top
> level directory may start the cleanup process simultaneously. Since
> performing cleanup in a large cache with a low limit_multiple can take a
> lot of time, more ccache processes may start to perform cleanup of the same
> directory. The race can lead to the final cache size being below
> limit_multiple*max_cache_size, perhaps very much so. This is a known
> problem. We have had some ideas to improve the admittedly naive cleanup
> logic, but nothing has been done yet.

 That problem, at least, seems relatively straightforward to fix.  First,
only one cleanup need be done in such situations, so a lock should be tested
and set by the first ccache process that decides a cleanup is necessary.  All
later comers should be delayed until that cleanup completes, but then those
others should proceed without also doing cleanups.  Their decisions in favor
of a cleanup are out of date once the cleanup run completes, so they should
just skip any cleanups themselves or at least retest the size of what they
need to store plus the current cache size against max_size to make a fresh
decision.
>
> Maybe the above described problem is why you get a 24 GB cache size?

 See discussion below.
>
> Or maybe you ran "ccache -c"? Unlike what the manual indicates, "ccache -c"

 No, it was automatically triggered.

> will delete files until each top level directory holds at most
> limit_multiple*max_size/16...
>
> why is limit_multiple ignored?
>
>
> It isn't. Or don't you see a difference if you e.g. set it to 0.5?
>
 I haven't tried that.  The caches I have represent a lot of CPU time
and elapsed time, especially given that I have compression turned on, so
I

[ccache] why is limit_multiple ignored?

2017-12-18 Thread Scott Bennett via ccache
 I set "limit_multiple = 0.95" in ccache.conf and "max_size = 30.0G"
in ccache.conf, but cleanups are triggered when space usage reaches 24 GB,
which is the default of 0.8.  Why is this happening with ccache 3.3.4?


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
**
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**

___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache


Re: [ccache] useful test package to use with ccache at the outset in Gentoo.....

2017-12-08 Thread Scott Bennett via ccache
Michael Fothergill via ccache <ccache@lists.samba.org> wrote:

> On 8 December 2017 at 01:35, Scott Bennett via ccache <
> ccache@lists.samba.org> wrote:
>
> > Michael Fothergill via ccache <ccache@lists.samba.org> wrote:
> >
> > > I have an amd64 kaveri box with 8GB RAM and run Gentoo stable on it.
> > >
> > > I have just installed ccache with 2GB memory allocated to it.
> >
> >  By that, I assume you have allocated some kind of memory-based device
> > for the cache.  Is that a correct understanding?
> >
>
> ?Thanks for your reply and comments.  I am assuming that by having the
> standard command ?
>
> ?CCACHE_SIZE="4G" (I have increased the allocation)

 As ccache is installed on my system, I cannot find an environment
variable of that name documented.  Are you sure it is being used?  What I
do find documented are CCACHE_MAX_SIZE and the corresponding ccache.conf
parameter max_size.  If you're going to build large items like libreoffice
or a LINUX world, you probably ought to have a much larger cache.  I also
found it a good idea to change limit_multiple from its default value of 0.8
to 0.95 to avoid half-hour-long cleanups in the middle of build runs.
 It is worth noting that, by my understanding of ccache the last time
I dug into it a bit, ccache actually maintains 16 distinct caches, but the
max_size and limit_multiple values apply to the total size and usage
fraction of the aggregate of all 16 caches.  When a cleanup occurs, ccache
chooses one of the 16 caches and begins deleting the least recently used
entries in it until the total space allocated has been reduced to the
limit_multiple fraction of the max_size.  If it runs out of things to
delete in that cache while the total allocated remains above that fraction,
ccache chooses another cache from which to begin deleting entries, and so on.
This procedure differs from the one described in the ccache man page and is
one reason I like to give more space to the cache(s) in order to prevent
recent entries from disappearing from a cache while other, far less recently
used entries remain in the other 15 caches.  By having a max_size large
enough to hold the last several iterations of frequently built items,
cleanups are more likely to satisfy the limit_multiple by deleting the oldest
few iterations of updates while sparing the more recently used entries 
in a cache.  These days the disk space is cheap enough to give it 10 GB to
30 GB without creating any problems for me, so I just do that.  Another trick
to keep the caches useful is to allocate separate ones for different purposes.
For example, I set up one for building the OS userland and kernel, another
for building libreoffice, and a third for building everything else.  Doing
this keeps the OS and libreoffice from evicting everything else or each other
prematurely. :-)

> then memory from the hard drive is being used by default here - I was not
> trying to use e.g. RAM memory.
>
 Oh.  Okay.
 If your system is used heavily for compiling software, you may see some
performance gains from putting the cache area onto an SSD.  Using a system
memory-based cache is, of course, lightning fast, but the entire cache
evaporates when the device is deallocated (e.g., during a system shutdown or
failure).  I tried using software five- and six-way RAID-0 devices for the
file system containing my caches for a while, but decided their performance
was poor.  At present I'm using a software CONCAT made of two two-way software
RAID-1's, all on the same kind of hard drives as the earlier setups, and this
setup seems to do very nicely for now.  I just have to remember not to run
updates at the same time as scrubs on the six-way raidz2 that occupies the
bulk of the same drives. :-)  I only scrub that pool about every three to four
weeks, though, so it usually isn't a problem.
> >
> > > I have tried some repeat compilations to see if there would be any speed
> > > increase.
> > >
> > > So far I have not seen much change but I am not skilled enough to improve
> > > things yet.
> >
> >  Your statistics show that slightly more than 45% of your total
> > compiler invocations (hits/(hits+misses)) were avoided.  Did that not
> > make a dent in your timings?
> > >
> > > I tried compiling gcc, glibc and imagemagick but did not see much
> > > improvement.
> >
> >  If you run the full build process for gcc, I would not expect
> > to see much improvement because most of it involves the use of either
> > a) a temporarily built compiler in a temporary location or b) the
> > newly built compiler being used for testing, but not yet installed
> > into the production location on your system.
> >
>
> ?Would cachecc1 perform any better with gcc??
&

Re: [ccache] useful test package to use with ccache at the outset in Gentoo.....

2017-12-07 Thread Scott Bennett via ccache
Michael Fothergill via ccache  wrote:

> I have an amd64 kaveri box with 8GB RAM and run Gentoo stable on it.
>
> I have just installed ccache with 2GB memory allocated to it.

 By that, I assume you have allocated some kind of memory-based device
for the cache.  Is that a correct understanding?
>
> I have tried some repeat compilations to see if there would be any speed
> increase.
>
> So far I have not seen much change but I am not skilled enough to improve
> things yet.

 Your statistics show that slightly more than 45% of your total
compiler invocations (hits/(hits+misses)) were avoided.  Did that not
make a dent in your timings?
>
> I tried compiling gcc, glibc and imagemagick but did not see much
> improvement.

 If you run the full build process for gcc, I would not expect
to see much improvement because most of it involves the use of either
a) a temporarily built compiler in a temporary location or b) the
newly built compiler being used for testing, but not yet installed
into the production location on your system.
 ImageMagick and GraphicsMagick both should provide useful timings
and ccache statistics.  glibc probably would, too, though it's not
nearly as big.  I don't know what sort of build procedures Gentoo uses,
but from the FreeBSD ports tree, here are some other good examples of
test cases:  math/octave, www/webkit-gtk2, www/webkit-gtk3,
www/webkit2-gtk3, devel/llvm40.  Be prepared to wait a long time for
the first compilation of each of the webkits.  They are big and slow
to compile and, in the past, have shown instabilities in their build
procedures when parallel make runs were used.  YMMV on another OS.
 One big savings for me was in running "make buildworld" and "make
buildkernel".  buildworld, on my last machine, was taking about six
hours elapsed time for a first run.  When running it later after
updating the source tree, the elapsed time was reduced by 2/3 to 3/4,
depending upon the number and sizes of source modules affected by the
updates.  Note that ccache and some other things need a slightly
different setup in order to build FreeBSD.  Your OS may also need some
special provision, so be sure to read the ccache installation
instructions for Gentoo carefully.


  Scott Bennett, Comm. ASMELG, CFIAG
**
* Internet:   bennett at sdf.org   *xor*   bennett at freeshell.org  *
**
* "A well regulated and disciplined militia, is at all times a good  *
* objection to the introduction of that bane of all free governments *
* -- a standing army."   *
*-- Gov. John Hancock, New York Journal, 28 January 1790 *
**

___
ccache mailing list
ccache@lists.samba.org
https://lists.samba.org/mailman/listinfo/ccache