Re: Adding serializer info to SitemapSource

Sylvain Wallez Wed, 08 Jun 2005 02:28:33 -0700

Carsten Ziegeler wrote:

Geert Josten wrote:

The caching algorithm is that smart (or complicated?) that it caches a
pipeline based on the components and their configuration, but
independent of the uri. So, if you have the same pipeline twice with a
different serializer and use the internal protocol it's basically the
same pipeline. If you now add the serializer information, you have two
different pipelines with two different cache results.

How can a pipeline have different serializers when the only difference is 
internal or external protocol?

This is a simplified example which doesn't make sense:


<match pattern="A">
 <generate src="a.xml"/>
 <serialize type="xml"/>
</match>

<match pattern="B">
 <generate src="a.xml"/>
 <serialize type="html"/>
</match>

In this case, the difference isn't because of the use of internal andexternal requests: calling "http://A"; and "cocoon://A" build the samepipeline!

A more complex sample is if you use non cacheable components:

<match pattern="A">
 <generate src="a.xml"/>
 <transform type="NOT CACHEABLE"/>
 <serialize type="xml"/>
</match>

<match pattern="B">
 <generate src="a.xml"/>
 <transform type="NOT CACHEABLE"/>
 <transform src="xmlTohtml.xsl"/>
 <serialize type="html"/>
</match>

The cached part of the two pipelines is the same.

BTW, in this case adding the serializer to the cocoon source would be wrong.

The partially cached content would have a key of type"PK-G-file-file:/path/a.xml|", which rightly doesn't include theserializer as it's not in the cacheable part.

The reason for not including the serializer in the cache key is whencocoon sources are SAXed, in which case the serializer is ignored. Therestricted key (without the serializer) can be used to cache the SAXstream. But when we cache the byte stream, we must include theserializer in the key!

Now the problem is that key and validity are computed before thepipeline is actually executed, i.e. at a time where we don't know if thecocoon source will be used with toSAX() or getInputStream().

I think we should therefore include the serializer in the key andvalidity (if the pipeline is fully cacheable). Sure, it will includeextra useless information for toSAX() calls, but will include everythingthat is necessary for a correct behaviour for getInputStream() calls.

Note that if a pipeline is only used in internal calls, its serializeris very likely to be "xml" which is cacheable and is always valid, whichtherefore doesn't affect the cacheability of the pipeline.


Let's sum up all this, which gets complicated ;-)

- I'm only considering fully cacheable pipelines (for partiallycacheable ones, the key stops at the last cacheable component)

- "restricted cache key" means key without the serializer
- "full cache key" means key with the serializer

- SitemapSource should be the full key (its hash actually) and full validity

- processing a pipeline to an XML consumer should cache the SAX streamusing the restricted key and restricted validity- processing a pipeline to an outputStream should cache the byte streamusing the full key and full validity


Does it make sense?

Sylvain

--
Sylvain Wallez                        Anyware Technologies
http://apache.org/~sylvain            http://anyware-tech.com
Apache Software Foundation Member     Research & Technology Director

Re: Adding serializer info to SitemapSource

Reply via email to