RE: StoreJanitor (was: Re: Moving reduced version of CachingSource to core | Configuration issues)

Ard Schrijvers Tue, 03 Apr 2007 04:12:52 -0700

Hello,
> 
> Ard Schrijvers wrote:
> > i would be glad to share the code and my ideas, for example 
> about this whole 
> StoreJanitor idea :-)  )
> 
> Just curious, what did you mean by "this whole StoreJanitor idea"?


Before I say things that are wrong, please consider that the StoreJanitor was 
invented long before I looked into the cocoon code, so probably a lot of 
discussion and good ideas has been around which I am not aware of. But still, 
my ideas about the StoreJanitor (and sorry for the long mail, but perhaps it 
might contain something useful):

1) How it works and its intention (I think :-) ):  The StoreJanitor is 
originally invented to monitor cocoon's memory useage and does this by checking 
some memory values every X (default 10) seconds. Beside the fact that I doubt 
users know that it is quite important to configure the store janitor correctly, 
I stick to the defaults and use a heapsize of just a little lower then JVM 
maxmemory. 

Now, every 10 seconds, the StoreJanitor does a check wether 
(getJVM().totalMemory() >= getMaxHeapSize() && (getJVM().freeMemory() < 
getMinFreeMemory()) is true, and if so, the next store is choosen (compared to 
previoud one) and entries are removed from this store (I saw a post that in 
trunk not one single store is chosen anymore, but an equal part of all of them 
is being removed, right?...probably you can configure which stores to use, i 
don't know)

2) My Observations: When running high traffic sites and render them live (only 
mod_cache in between which holds pages for 5 to 10 min) like [1] or [2], then 
checking every X sec for a JVM to be low on memory doesn't make sense to me. At 
the moment of checking, the JVM might be perfectly sound but just needed some 
extra memory for a moment, in that case, the Store Janitor is removing items 
from cache while not needed. Also, when the JVM is really in trouble, but the 
Store Janitor is not checking for 5 more sec....this might be too long for a 
JVM in a high traffic site when it is low on memory. Problems that result from 
it are:

- Since there is no way to remove cache entries from the used cache impl by the 
cache's eviction policy, the cache entries from memory are removed by starting 
from entry 0, whatever this might be in the cache. There is a very likely 
situation, that at the very next request, the same cache entries are added 
again.

- Ones the JVM gets low on memory, and the StoreJanitor is needed, it is quite 
likely that from that moment on, the StoreJanitor runs *every* 10 seconds, and 
keeps removing cache entries which you perhaps don't want to be removed, like 
compiled stylesheets. 
        1) suppose, from one store (or since trunk from multiple stores) 10% 
(default) is removed. This 10% is from the number of memory cache entries. I 
quite frequently happen to have only 200 entries in memory for each store ( I 
have added *many* different stores to enable all we wanted in a high traffic 
environment) and the rest is disk store. Now, suppose, the JVM which has 512 Mb 
of memory, is low on memory, and removes 10% of 200 entries = 20 entries, 
helping me zero! These memory entries are my most important ones, so, on the 
next request, they are either added again, or, from diskcache I have a hit, 
implying that the cache will put this cache entry in memory again. If I would 
use 2000 memory items, I am very sure, the 200 items which are cleaned are put 
back in memory before the next StoreJanitor runs.
        2) I am not sure if in trunk you can configure wether the StoreJanitor 
should leave one store alone, like the DefaultTransientStore. In this store, 
typically, compiled stylesheets end up, and i18n resource bundles. Since these 
files are needed virtually on every request, I had rather not that the 
StoreJanitor removes from this store. I think, the StoreJanitor does so, 
leaving my "critical app" in an even worse state, and on the next request, the 
hardly improved JVM needs to recompile stylesheets and i18n resource bundles.
        3) What if the JVM being low is not because of the stores....For 
example, you have added some component which has some problems you did not 
know, and, that component is the real reason for you OOM. The StoreJanitor, 
sees your low memory, and starts removing entries from your perfectly sound 
cache, leaving you app in a much worse situation then it already was. Your 
component with memory leak has some more memory it now can fill, and hapily 
does this, making the StoreJanitor remove more and more entries from cache, 
untill it ends up with an empty cache. You could blame the wrong component for 
this behavior. One of these wrong components in use is the event registry for 
event caching, which made our high traffic sites with 512 Mb crash every two 
days. Better that I write in another mail what I did to the event cache 
registry, why I did not yet post about it, and if others are interested and how 
to include it in the trunk. Bottom line is that there was a major OOM problem 
if the registry grows, resulting in a StoreJanitor removing cache entries while 
this was actually increasing the problem.
        4) By default, probably most people are using ehcache. Naturally, 
overflow-to-disk is true. In a high traffic site, the number of cache keys can 
grow enormously (I have seen mails around people complaing about disk cached 
growing to multiple Gbytes). Certainly, when the not very experienced user uses 
something like a session attr (or timestamp and many more possibilities) in a 
stylesheet parameter which ends up in the cache key (but perhaps, should cocoon 
be the target for high traffic sites for the average user, I don't know). Now, 
and this is IMO one of the major weakenesses of ehcache (or I missed it 
completely), I did not find any way to limit the number of disk store entries. 
This implies, that the disk store can grow indefinitely. For the ones ever 
looking at the status page, cache keys in memory of about 2 kb are quite common 
in cocoon (actually, the dept of the folder structure of your app is of 
influence). The disk store cache keys are kept in *memory*. So, suppose, you 
run your app with 128 Mb, and you have overflow-to-disk=true, your app runs 
into problem when there are about 50.000 keys in cache. Then your StoreJanitor 
keep removing entries from your memory cache, which are refilled with disk 
store entries just a few moments later. Now, if you really know how to 
configure your stores, you use a time2liveSeconds and time2IdleSeconds to let 
your store clear unused cache entries. This is good to do, unless, you depend 
on something like an event registry which is currently in cocoon trunk. The 
problem is, that the StoreJanitor removes cache entries by calling the free 
from the correct store, which, might for example be the eventaware store. This 
event aware store, updates (cleans) its registry before removing the cache 
entry from its delegate. Now, when you use the internal cleaning of caches by a 
time2liveSeconds or time2IdleSeconds, the event registry is not cleaned and 
will lead to OOM in the long run. 

I have more things about it, but probably nobody will read it anymore, but in 
short, my conclusion is that the StoreJanitor never helped me out, but merely 
impoverished my app when it ran

                                                --------o0o--------

The rules I try to follow to avoid the Store Janitor to run

1) use readers in noncaching pipelines and use expires on them to avoid 
cache/memory polution
2) use a different store for repository binary sources which has only a disk 
store part and no memory part (cached-binary: protocol added)
3) use a different store for repository sources then for pipeline cache
4) replaced the abstract double mapping event registry to use weakreferences 
and let the JVM clean up my event registry
5)  (4) gave me undesired behavior by removing weakrefs in combination with 
ehcache when overflowing items to disk (i could not reproduce this, but seems 
that my references to cachekeys got lost). Testing with JCSCache solved this 
problem, gave me faster response times and gave me for free to limit the number 
of disk cache entries. Disadvantage of the weakreferences, is that I disabled 
persitstent caches for jvm restarts, but, I never wanted this anyway (but this 
might be implemented quite easily, but might take long start up times)
6) JCSCache has a complex configuration IMO. Therefor, I added default 
configurations to choose from, for example:




[1] http://www.minfin.nl
[2] http://www.minbuza.nl

RE: StoreJanitor (was: Re: Moving reduced version of CachingSource to core | Configuration issues)

Reply via email to