Perhaps the subject should be: Caching problems with cocoon.
You do want the very exact same thing I have been looking for for a few days. A
very basic example below shows the problem:
-----------------------------------
<map:match pattern="getcontent">
<map:generate src="test.xml"/>
<map:serialize/>
</map:match>
<map:match pattern="index">
<map:aggregate element="root">
<map:part element="part1" src="cocoon:/getcontent"/>
<map:part element="part2" src="cocoon:/getcontent"/>
...............
..............
<map:part element="part50" src="cocoon:/getcontent"/>
</map:aggregate>
<map:serialize/>
</map:match>
------------------------------------
OR
------------------------------------
<map:match pattern="index">
<map:aggregate element="root">
<map:part element="part1" src="test.xml"/>
<map:part element="part2" src="test.xml"/>
...............
..............
<map:part element="part50" src="test.xml"/>
</map:aggregate>
</map:match>
-----------------------------------
test.xml = <foo/>
If you try these to setups for expires,caching,ecaching pipelines, then in the
first run, both examples will be equally fast. BUT, after it is cached, the
second example will give results back much quicker (depending on the number of
parts, but with 50, count a factor 10!)
Why is this....it is because the aggregator in the first example has to find it
child pipelines cachekeys, therefor instantiating
<map:match pattern="getcontent">
<map:generate src="test.xml"/>
<map:serialize/>
</map:match>
50 times, to get the cache-key that has to be checked for validty! I think what
is most time consuming is instantiating the pipeline. I did try to generate the
test.xml with my own generator which returned a NOPvalidity, resulting a some
time gain (about 20% in this specific example), but not enough. So, I concluded
that I don't think the lookup of the cache-key take very long, but Cocoon
figuring out which cache key to lookup takes long.
We build complex sites, using dynamic location maps, which result from several
.xml files and transformations. The locationmaps are for example used 5 times
for each request. Now, just a cached dynamic locationmap won't do, since
looking up all depending child pipelines and finding out keys, are way
expensive!! You can gain very much time, by making sure that not to many
pipelines are called for critical things like a dynamic location map.
Then of course, we can just write the output to filesystem, do 1 generate, and
it is cached (with one cache key, so it is fast!)
What I would really like, is much smarter caching, involving smart
invalidation, like, the dasl transformer in a child pipeline is invalidated by
a JMS event, making sure that it's entire pipeline is invalidated, causing its
parent pipeline to invalidate. Now, an aggregator for example, knows all by
itself, if it is valid or not. The current caching is based on a very expensive
cache key lookup.
Well, of course, things will get very complex for map:acts, map:selects,
sessions stored in cache-keys etc, but if we would only focus on: event based
invalidation of the cache, and cache keys based on only request-parameters, the
pipeline and cocoon-parameters, it should be possible.
For example, <map:pipeline type="graphcache"> where the graphcache implies that
it keeps track of dependencies. That also means, that changing a file on
filesystem is not an event, so it won't be invalidated. Is that a problem,
well, not if a site is deployed....after deployment, the filegenerator does not
have to check the validity of a file: I know it is valid.
I know it possibly won't fit in cocoon easy, but making complexer sites and
wanting to exploit the caching in cocoon effectively, there has to be some
smarter caching invalidation.
What I found the weirdest of all, was that for an expiring pipeling, you can
actually set the expire time and the exact cache key it should use, so it can
be unambigously found when the pipeline is called again, that it still checks
all child pipeline keys (which again can be very expensive).
These were just my two cents on cocoon caching...
Regards Ard
> I'm still a little unclear about how SourceValidities and non-caching
> pipelines work. The way I believe it works is that non-caching
> pipelines are always considered to be invalid and their content is
> recreated. I believe then that if a parent pipeline
> aggregates (by any
> of the aggregation techniques) one or more non-caching child
> pipelines
> that the parent would always have to regenerate its content.
>
> What I would like is somewhat different.
>
> When aggregating content both the caching pipeline and the parent
> pipeline will have the content, albeit in different forms.
> What I would
> like is for the parent pipeline to know that the content it
> has is still
> valid so that it can return the aggregated content but if the
> aggregated
> content needs to be reconstructed then the child pipeline
> will have to
> recreate it.
>
> Do I have a misunderstanding of how this works?
>
> Ralph
>