Gianugo Rabellino wrote:

This RT integrates the one done more than one year ago and available at http://marc.theaimsgroup.com/?t=101074439900001&r=1&w=2.
>
As of now you know that we have a basic HTTP header control that mimics at a pipeline level the mod_expires functionality of the Apache HTTPD server. This was a good start, but now I feel it's time to refine it and make it better. Work is needed on two sides:

Proxy handling
==============

The approach to full proxy compliance should be done, once again :-), in microsteps. I've been reading the HTTP/1.1 specs and the proxy-related RFCs, and boy, it's not easy at all to implement a fully proxy compliant system. It can be done, but it requires serious thinking and a major rework of the request handling phase.

Full proxy compliance depends on the ability of dealing with conditional requests, handling a bunch of request headers all in some way interdependant and tricky to say the least. I'm not saying that we shouldn't do that sooner or later, but I'd rather plan this activity carefully, and possibily together with someone (Chuck?) from the httpd group working on the proxy part, in order to ensure that things work smoothly.
At ApacheCon I set with Chuck who forwarded me to another guy who's the one doing the work nowadays, but I forgot who he was. But I can ask him again.

So, the first microstep is an easy one, just as a start. The companion to the expires header is the "Cache-Control" header (http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9): this header allows for a finer grained control over the request, suggesting proxies what to with the results.

While Expires uses an HTTP date header, built in Cocoon by adding the result of the pipeline@expires attribute to the current system time, Cache-Control is somehow smarter, since it gives caches an hint on what is cacheable, how it should be cached (revalidated or not) and for how long in seconds. To make it short, my proposal is to add a Cache-Control header to any request coming from a pipeline with the "expires" attribute set with the following template:

Cache-Control: max-age={expires value in seconds}, public

The "public" keyword instructs the proxy to store a resource in its cache even if it should not be considered cacheable. This can be dangerous somehow, since the proxy will serve requests coming from "protected" resources without performing authentication on the origin server, but in the end I think that it's safe to assume that if a pipeline is marked with an "expires" header, than the user is perfectly aware that such resource can, and will, be cached.
Question: (and an important one)

Suppose you have a resource like

/images/logo

that you hit with two different user agent and that a pipeline renders differently depending on the user agent, how can a proxy behave friendly to this? do we have a way to specify that a specific request has to be matched not only against a URI but also against the user-agent that requested it?

I'm perfectly aware of the fact that we could have the resource /images/logo redirect to /images/logo.png or /image/logo.gif depending on user agent and that route around the proxy problem, but that's a hack and involves another round-trip to the client for the redirect.

Pier and I came up with this question and we think it might be an HTTP architectural fault, but before asking Roy, what do you people think?

The patch is a no-brainer, such as:

Index: src/java/org/apache/cocoon/components/pipeline/AbstractProcessingPipeline.java
===================================================================
RCS file: /home/cvs/xml-cocoon2/src/java/org/apache/cocoon/components/pipeline/AbstractProcessingPipeline.java,v
retrieving revision 1.33
diff -r1.33 AbstractProcessingPipeline.java
468a469
>
472c473,474
< res.setDateHeader("Expires", expires);
---
> res.setDateHeader("Expires", System.currentTimeMillis() + expires);
> res.setHeader("Cache-Control", "max-age=" + expires/1000 + ", public");
474c476
< new Long(expires));
---
> new Long(expires + System.currentTimeMillis()));
760c762
< return System.currentTimeMillis() + expires;
---
> return expires;

The only problem I see is that this header is not set under Tomcat (*argh*, Jetty works just OK!)
why I'm not surprised?

so I have to investigate what's going wrong, but for the rest I'm ready to commit it if you agree on the idea (I'm reluctant to commit it right away since it somehow touches the pipeline core, where I almost never worked). Now for the second (and more interesting) point: Cocoon integration.

Cocoon integration
==================

The above approach works perfectly for communication with the external world, be it a reverse proxy or just a browser cache. Sometimes, however, there might be a case where you might want to use this concept internally: imagine to have an aggregation of different cocoon pipelines, where you have some resources for which you want to check validity strictly and some others that are pretty heavy to generate, uncacheable because the components you are using are not cacheable by themselves but on which you have full control on the expiration time. In this case, having an internal use of the expires attribute would be pretty useful, i.e.:

<pipeline internal-only="true">
<parameter name="expires" value="now plus 5 minutes"/>
isn't something like

 <parameter name="TTL" value="5 minutes"/>

simpler to understand? we don't expect people to write

 <parameter name="expires" value="tomorrow plus 25 hours"/>

don't we?

<match pattern="my-heavy-resource">
<generate src="xmldb:xindice:///db/not/changing/frequently"/>
<serialize/>
</match>
</pipeline>

<pipeline internal-only="true">
<match pattern="my-dynamic-resource">
<generate src="/content/that/might/change"/>
<serialize/>
</match>
</pipeline>

<pipeline>
<match pattern="mybeautifulportal.html">
<aggregate element="portal">
<part src="cocoon://my-heavy-resource" element="news"/>
<part src="cocoon://my-dynamic-resource" element="data"/>
</aggregate>
<tranform src="myportal2html.xsl"/>
<serialize type="html"/>
</match>
</pipeline>


If we agree that this is useful, let's see the actual implementation. First, let's get back to the general principle: if a user sets an "expires" attribute on a pipeline, what she want's to say is "I know better than the Cocoon cache for how long this resource has to be considered fresh". This is by all means a configuration imposed by the user, to which the caching system should obey blindly. My opinion
then, wrt the caching pipeline, is that if an expires was set, all the pipeline engine should do is to check if the given resource has already been generated, and if the expiration time has not passed yet. If so, the resource should be considered fresh disregarding any Validity objects or Cacheable components.

This, AFAIU, would boost the performance even for internal pipelines and aggregation, and would let us use internal pipelines in a smarter and faster way. Not only that: if we are to use the expires feature even internally, Cocoon's performance will get a boost even without using a reverse proxy in front of the application server, since all the (potentially heavy) algorithms to check the resource's validity would be skipped.

Now for the implementation I wish I knew better the Cocoon caching internals, but from a quick read it seems to me that there should be:

- some logic in CachedResponse to store and get expires (easy);

- appropriate logic in the proper points to obtain the expires object from the environment and set a CachedResponse accordingly (is it enough to change CachingProcessingPipeline#cacheResults?

- more logic in the validatePipeline() method in AbstractCachingProcessingPipeline.java to take into account the expires object configured, if present.

- in all cases, all the algorithms that check if a cached entry is still valid, i.e. every place where a cache entry is built, validated or invalidated, should take into account the expires configuration.

I have started to play on this too, but I am wondering if I'm following the right path or if I'm missing something. Also, it might be worth considering to have a different CachingPipeline implementation (ExpiresEnabledCachingPipeline? Yuck ;-)), at least for a first start.

Comments and questions?
Sounds like a good idea but I think Carsten is the one that knows the caching internals better.

One thing that you didn't describe is the ability for cocoon to reply to proxy requests with the 'body hasn't changed' error code just by using the pipeline caching logic but without having to regenerate the whole thing. Is this another microstep or you have reasons against this?

--
Stefano Mazzocchi <[EMAIL PROTECTED]>
Pluralitas non est ponenda sine necessitate [William of Ockham]
--------------------------------------------------------------------



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

Reply via email to