RE: Caching question

Ard Schrijvers Sat, 14 Jan 2006 03:45:43 -0800

Perhaps the subject should be: Caching problems with cocoon. 
You do want the very exact same thing I have been looking for for a few days. A 
very basic example below shows the problem:


-----------------------------------
<map:match pattern="getcontent">
        <map:generate src="test.xml"/>
        <map:serialize/>
</map:match>
<map:match pattern="index">
        <map:aggregate element="root">
                <map:part element="part1" src="cocoon:/getcontent"/>
                <map:part element="part2" src="cocoon:/getcontent"/>
                ...............
                ..............
                <map:part element="part50" src="cocoon:/getcontent"/>
        </map:aggregate>
        <map:serialize/>
</map:match>
------------------------------------
OR 
------------------------------------
<map:match pattern="index">
        <map:aggregate element="root">
                <map:part element="part1" src="test.xml"/>
                <map:part element="part2" src="test.xml"/>
                ...............
                ..............
                <map:part element="part50" src="test.xml"/>
        </map:aggregate>
</map:match>
-----------------------------------
test.xml = <foo/>

If you try these to setups for expires,caching,ecaching pipelines, then in the 
first run, both examples will be equally fast. BUT, after it is cached, the 
second example will give results back much quicker (depending on the number of 
parts, but with 50, count a factor 10!)

Why is this....it is because the aggregator in the first example has to find it 
child pipelines cachekeys, therefor instantiating 

<map:match pattern="getcontent">
        <map:generate src="test.xml"/>
        <map:serialize/>
</map:match>

50 times, to get the cache-key that has to be checked for validty! I think what 
is most time consuming is instantiating the pipeline. I did try to generate the 
test.xml with my own generator which returned a NOPvalidity, resulting a some 
time gain (about 20% in this specific example), but not enough. So, I concluded 
that I don't think the lookup of the cache-key take very long, but Cocoon 
figuring out which cache key to lookup takes long. 

We build complex sites, using dynamic location maps, which result from several 
.xml files and transformations. The locationmaps are for example used 5 times 
for each request. Now, just a cached dynamic locationmap won't do, since 
looking up all depending child pipelines and finding out keys, are way 
expensive!! You can gain very much time, by making sure that not to many 
pipelines are called for critical things like a dynamic location map. 

Then of course, we can just write the output to filesystem, do 1 generate, and 
it is cached (with one cache key, so it is fast!)

What I would really like, is much smarter caching, involving smart 
invalidation, like, the dasl transformer in a child pipeline is invalidated by 
a JMS event, making sure that it's entire pipeline is invalidated, causing its 
parent pipeline to invalidate. Now, an aggregator for example, knows all by 
itself, if it is valid or not. The current caching is based on a very expensive 
cache key lookup. 

Well, of course, things will get very complex for map:acts, map:selects, 
sessions stored in cache-keys etc, but if we would only focus on: event based 
invalidation of the cache, and cache keys based on only request-parameters, the 
pipeline and cocoon-parameters, it should be possible. 

For example, <map:pipeline type="graphcache"> where the graphcache implies that 
it keeps track of dependencies. That also means, that changing a file on 
filesystem is not an event, so it won't be invalidated. Is that a problem, 
well, not if a site is deployed....after deployment, the filegenerator does not 
have to check the validity of a file: I know it is valid.

I know it possibly won't fit in cocoon easy, but making complexer sites and 
wanting to exploit the caching in cocoon effectively, there has to be some 
smarter caching invalidation. 

What I found the weirdest of all, was that for an expiring pipeling, you can 
actually set the expire time and the exact cache key it should use, so it can 
be unambigously found when the pipeline is called again, that it still checks 
all child pipeline keys (which again can be very expensive). 

These were just my two cents on cocoon caching...

Regards Ard


> I'm still a little unclear about how SourceValidities and non-caching 
> pipelines work.  The way I believe it works is that non-caching 
> pipelines are always considered to be invalid and their content is 
> recreated.  I believe then that if a parent pipeline 
> aggregates (by any 
> of the aggregation techniques) one or more non-caching child 
> pipelines 
> that the parent would always have to regenerate its content.
> 
> What I would like is somewhat different.
> 
> When aggregating content both the caching pipeline and the parent 
> pipeline will have the content, albeit in different forms.  
> What I would 
> like is for the parent pipeline to know that the content it 
> has is still 
> valid so that it can return the aggregated content but if the 
> aggregated 
> content needs to be reconstructed then the child pipeline 
> will have to 
> recreate it.
> 
> Do I have a misunderstanding of how this works?



> 
> Ralph
>

RE: Caching question

Reply via email to