Carsten Ziegeler wrote:
Geert Josten wrote:
The caching algorithm is that smart (or complicated?) that it caches a
pipeline based on the components and their configuration, but
independent of the uri. So, if you have the same pipeline twice with a
different serializer and use the internal protocol it's basically the
same pipeline. If you now add the serializer information, you have two
different pipelines with two different cache results.
How can a pipeline have different serializers when the only difference is
internal or external protocol?
This is a simplified example which doesn't make sense:
<match pattern="A">
<generate src="a.xml"/>
<serialize type="xml"/>
</match>
<match pattern="B">
<generate src="a.xml"/>
<serialize type="html"/>
</match>
In this case, the difference isn't because of the use of internal and
external requests: calling "http://A" and "cocoon://A" build the same
pipeline!
A more complex sample is if you use non cacheable components:
<match pattern="A">
<generate src="a.xml"/>
<transform type="NOT CACHEABLE"/>
<serialize type="xml"/>
</match>
<match pattern="B">
<generate src="a.xml"/>
<transform type="NOT CACHEABLE"/>
<transform src="xmlTohtml.xsl"/>
<serialize type="html"/>
</match>
The cached part of the two pipelines is the same.
BTW, in this case adding the serializer to the cocoon source would be wrong.
The partially cached content would have a key of type
"PK-G-file-file:/path/a.xml|", which rightly doesn't include the
serializer as it's not in the cacheable part.
The reason for not including the serializer in the cache key is when
cocoon sources are SAXed, in which case the serializer is ignored. The
restricted key (without the serializer) can be used to cache the SAX
stream. But when we cache the byte stream, we must include the
serializer in the key!
Now the problem is that key and validity are computed before the
pipeline is actually executed, i.e. at a time where we don't know if the
cocoon source will be used with toSAX() or getInputStream().
I think we should therefore include the serializer in the key and
validity (if the pipeline is fully cacheable). Sure, it will include
extra useless information for toSAX() calls, but will include everything
that is necessary for a correct behaviour for getInputStream() calls.
Note that if a pipeline is only used in internal calls, its serializer
is very likely to be "xml" which is cacheable and is always valid, which
therefore doesn't affect the cacheability of the pipeline.
Let's sum up all this, which gets complicated ;-)
- I'm only considering fully cacheable pipelines (for partially
cacheable ones, the key stops at the last cacheable component)
- "restricted cache key" means key without the serializer
- "full cache key" means key with the serializer
- SitemapSource should be the full key (its hash actually) and full validity
- processing a pipeline to an XML consumer should cache the SAX stream
using the restricted key and restricted validity
- processing a pipeline to an outputStream should cache the byte stream
using the full key and full validity
Does it make sense?
Sylvain
--
Sylvain Wallez Anyware Technologies
http://apache.org/~sylvain http://anyware-tech.com
Apache Software Foundation Member Research & Technology Director