Overall agreement, just a few points to keep in mind below...
Sylvain Wallez wrote:
<snip/>
> Cocoon currently has two stores: > - a transient store: <transient-store> in cocoon.xconf, > Store.TRANSIENT_STORE in Java code > - a persistent store: <persistent-store>, Store.PERSISTANT_STORE which > equals Store.ROLE (more on this below)
I have always assumed that the labels "Transient" and "Persistent" were attempts to generalize the concepts "in memory" and "on disk" and the fact that the terms seem to imply a deeper contract was an unfortunate side effect. Still, your proposal makes sense and I agree we should pursue it.
That's how I understand them also, but the fact that the persistent role equals the general store role makes the distinction rather useless...
.....
Pipeline caching
----------------
The CachingPipeline uses a Cache component to load/save cached responses. The only implementation of Cache, CacheImpl, uses a store which is... Store.TRANSIENT_STORE!!!
Actually, there are several implementations of Cache all in simultaneous use now. Carsten made the Cache each pipeline uses configurable a few months ago and uses a(some?) overloaded Cache implementation in the portal.
I couldn't find anything but the "type" attribute on <map:pipeline>. This attribute chooses the pipeline implementation, but not the store. Is there something else that I missed?
Also, the eventcache (really event aware cache) is currently implemented as an overloaded CacheImpl which adds some additional processing to support event-based invalidation of cached objects.
Damn, missed that one. But it relies on the store looked up in its superclass, and so will also benefit of the changes.
I haven't thought through how that would impact your proposal if at all.
I don't think there will be some impacts, since what I'm suggesting will change the Store role used by CacheImpl, but not the actual behaviour of what's defined by this role.
> Transient-cache's "maxobjects"
> ------------------------------
> The transient cache has a "maxobjects" of 100, meaning that at most 100
> non-serializable objects will be kept in memory. This is obviously too
> low, furthermore considering that pipeline content also goes in this
> cache, and that a Cocoon pre-analyses lots of things (stylesheets,
> jxtemplates, XSP logicsheets, woody form definitions) that would benefit
> of being kept longer in memory.
>
> And what's the point of having a store-janitor that is supposed to flush
> the stores when memory is low if there is such a low hard limit?
Of course the 100 maxobjects is configurable and is necessary even in your proposal isn't it? Perhaps we now change the default, but there must be some configurable limit, no?
A hard limit makes sense IMO only for the memory front-end of a two-stage cache. For the in-memory cache, we should better let the store-janitor do its job based not on the number of stored objects, but on the actual JVM memory consumption.
And if this is true, separating the Stores could lead to harder to manage memory configuration because there are multiple collections. I'm also thinking ahead to Stefano's adaptive cache. As complicated as that is, wouldn't separating the caches make it more complex coordinating resources across them?
What if we use a pluggable FlushStrategy (like Validity) which would allow different types of objects (transient, transient->persistent) to all go in the same Store?
VolatileFlushStrategy := never go to persistent store
PersistentFlushStrategy := go to persistent store when janitor requests
ReluctantFlushStrategy := only go to persistent store on container shutdown
TrickyAdaptiveFlushStrategy := cost weighted decision about whether to go persistent or not, and if so how early.
This is just off the top of my head - haven't thought it through carefully but do you all see my point of the potential downside of splitting the actual Store?
I see your point. But if I understand it correctly, your proposal requires components to give a hint on the store about the flush strategy that should be applied to the stored object. But how will components choose the correct strategy, and how will the store handle a myriad of different stragegy implementations? Furthermore, I think the flush strategy is a concern of the Store, and that components should just give a hint on the properties of stored objects regarding their storage.
We can consider that choosing between the various store roles is a way to indicate this.
> Private caches all over the place > --------------------------------- > I mentioned above components that pre-analyze files like jxtemplate, > woody form definitions, flowscript, etc. Now if we look closer at these > components, we see that each of them has its own private cache (often a > static Map). This means that every loaded file is kept in memory > forever, even if only used once in the system lifetime, and even if the > corresponding file is actually deleted!
Don't know those situations specifically, but this sounds like it can/should be fixed whether the Store changes or not?
Yes and no: most of these objects aren't serializable, and having a "persistent" transient store doesn't encourage component writers to it...
....
--- oOo ---
>
> Conclusion
> ----------
> The proposed changes want to clarify the respective roles of the various
> stores, and make them behave as they should according to their names
> (e.g. transient is really transient). This should allow us to better
> understand what's going on in the system, and optimize memory usage for
> a better scalability.
>
> So, what do you think?
Hope I haven't clouded the discussion...
Not at all! There's still room for improvement in the flush strategy, but we have a problem to solve today. If the evolutions require a single store with storage hints, we'll just have to change the implementations of the various stores roles we have today to proxies to the global cache. Each store delegate storage to this global cache with a particular hint for the flush strategy.
Sylvain
-- Sylvain Wallez Anyware Technologies http://www.apache.org/~sylvain http://www.anyware-tech.com { XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects } Orixo, the opensource XML business alliance - http://www.orixo.com
