Carsten Ziegeler wrote:

Geert Josten wrote:
The caching algorithm is that smart (or complicated?) that it caches a
pipeline based on the components and their configuration, but
independent of the uri. So, if you have the same pipeline twice with a
different serializer and use the internal protocol it's basically the
same pipeline. If you now add the serializer information, you have two
different pipelines with two different cache results.
How can a pipeline have different serializers when the only difference is 
internal or external protocol?
This is a simplified example which doesn't make sense:

<match pattern="A">
 <generate src="a.xml"/>
 <serialize type="xml"/>
</match>

<match pattern="B">
 <generate src="a.xml"/>
 <serialize type="html"/>
</match>

In this case, the difference isn't because of the use of internal and external requests: calling "http://A"; and "cocoon://A" build the same pipeline!

A more complex sample is if you use non cacheable components:

<match pattern="A">
 <generate src="a.xml"/>
 <transform type="NOT CACHEABLE"/>
 <serialize type="xml"/>
</match>

<match pattern="B">
 <generate src="a.xml"/>
 <transform type="NOT CACHEABLE"/>
 <transform src="xmlTohtml.xsl"/>
 <serialize type="html"/>
</match>

The cached part of the two pipelines is the same.

BTW, in this case adding the serializer to the cocoon source would be wrong.

The partially cached content would have a key of type "PK-G-file-file:/path/a.xml|", which rightly doesn't include the serializer as it's not in the cacheable part.

The reason for not including the serializer in the cache key is when cocoon sources are SAXed, in which case the serializer is ignored. The restricted key (without the serializer) can be used to cache the SAX stream. But when we cache the byte stream, we must include the serializer in the key!

Now the problem is that key and validity are computed before the pipeline is actually executed, i.e. at a time where we don't know if the cocoon source will be used with toSAX() or getInputStream().

I think we should therefore include the serializer in the key and validity (if the pipeline is fully cacheable). Sure, it will include extra useless information for toSAX() calls, but will include everything that is necessary for a correct behaviour for getInputStream() calls.

Note that if a pipeline is only used in internal calls, its serializer is very likely to be "xml" which is cacheable and is always valid, which therefore doesn't affect the cacheability of the pipeline.

Let's sum up all this, which gets complicated ;-)
- I'm only considering fully cacheable pipelines (for partially cacheable ones, the key stops at the last cacheable component)
- "restricted cache key" means key without the serializer
- "full cache key" means key with the serializer

- SitemapSource should be the full key (its hash actually) and full validity
- processing a pipeline to an XML consumer should cache the SAX stream using the restricted key and restricted validity - processing a pipeline to an outputStream should cache the byte stream using the full key and full validity

Does it make sense?

Sylvain

--
Sylvain Wallez                        Anyware Technologies
http://apache.org/~sylvain            http://anyware-tech.com
Apache Software Foundation Member     Research & Technology Director

Reply via email to