- a transient store: <transient-store> in cocoon.xconf, Store.TRANSIENT_STORE in Java code
- a persistent store: <persistent-store>, Store.PERSISTANT_STORE which equals Store.ROLE (more on this below)
The transient-store, as its name implies, should be transient, and used to cache objects that should not be serialized, either because they cannot (the case of Xalan Transformers) or because it doesn't make sense (e.g. some app-related data that needs to be refreshed on startup).
The persistent store should be used for objects that can be either long-lived or whose size justify their storage on disk. This includes of course the CachingPipeline results.
Just regarding the names I would expect a persistent store to keep its contents and never evict entries. Maybe even spanning application stop / start. A transient store OTOH could throw out entries at will and maybe should be clean after an application restart. However, if the store uses a disk should not be important. Of course it would be highly desirable to only pass serializable objects (or object trees) to a persistent store, maybe even to the transient store.
Really, a transient store is a little like an oxymoron and should be named a cache while a persistent store is, well, a store.
Certainly it would be nice to express an importance of a cache entry for consideration by the cache manager when selecting entries for eviction. But then I'm not quite sure if LRU and entry size are a heuristic good enough.
A persistent store may use a two step design, putting entries into memory before writing eg on a disk. At this point we need to be clear how good the persistence guarantee really is. Since we're not building a database management system, best effort would certainly do.
Private caches all over the place
---------------------------------
I mentioned above components that pre-analyze files like jxtemplate, woody form definitions, flowscript, etc. Now if we look closer at these components, we see that each of them has its own private cache (often a static Map). This means that every loaded file is kept in memory forever, even if only used once in the system lifetime, and even if the corresponding file is actually deleted!
This is a big problem, indeed. Since many of those actually build their private cache from a source, I have started a SourceCache in the scratchpad block but got distracted :-( Still, maybe it could be useful when cleaning up the private caches.
The only implementation (SoftSourceCache) uses a ReferenceMap from JCCol that could at least be cleaned by the garbage collector when memory is low. Using a store would surely be better, though. But it's only a first cut :-)
I propose to clearly distinguish the 3 roles and the associtated semantics:
- Store.ROLE is the "general-purpose" store. A component that doesn't care if the cache is transient or persistent should use this one. Being general-purpose, it should be efficient but also swap old objects to persistent storage.
I'd say a store should have no guarantees at all, well, beyond "after store, retrieve the result is either an (deep) equal object as stored or null"
- Store.TRANSIENT_STORE should be used to keep objects that aren't serializable but should be kept in memory as far as possible. The flush strategy of this store should not be mixed with a limited-size MRU policy of a persistent store front-end.
Yep.
- Store.PERSISTANT_STORE should be, as its name implies, only persistant, with no memory front-end or whatsoever.
There is no reason to forbid a memory based stage for a persistent store. But it should be clear whether there is a guarantee in a transactional sense or not. Since this is an Avalon component issue, I'd say there need to be a role for transactional and non-transactional stores. (Ie the first guarantees that the data is persisted when the call returns while the second just guarantees to persist the data upon normal application shutdown and only if no error occurs)
Redefine Cocoon stores
----------------------
With the above definitions, here is how the 3 stores should be configured in Cocoon:
<transient-store> should be a MRUMemoryStore with use-persistent-store=false (the default) and no maxobjects limit. It will be flushed as needed by the store janitor.
<store> should be a MRUMemoryStore with use-persistent-store=true, and a fixed maxobjects (can be tuned according to the physical memory). It therefore becomes a two-stage cache with a limited number of objects in memory.
<persistent-store> can be the current Jisp-based implementation.
Note that it's very unlikely that some component other than <store> will directly use <transient-store> (direct read/write to disk without a memory front-end isn't efficient). So we may want to write a new class that combines MRU+Jisp to remove a <persistent-store> from the xconf file.
Review store usage in the code
------------------------------
Once we have a clean cache setup, we can review the code and use the stores according to their respective semantics.
The pipeline cache, and all other storage of Serializable objects should go into Store.ROLE.
Stylesheets, flowscripts, jxtemplates, woody form defs, etc, should no more have a private cache, but should use Store.TRANSIENT_STORE.
Please consider the SourceCache idea.
Otherwise I'm +1 on this.
Chris.
