[RT]Cache/proxy friendly HTTP headers

Gianugo Rabellino Fri, 11 Jan 2002 02:14:34 -0800


Let's face it: we are slow. Not painfully slow, not even very slow: 
actually we are pretty fast for being a server-side Java application and 
we are such great programmers that we are as fast as the environment on 
which we are hosted can be fast :-)


Yet I'm afraid we are still slammed in the face by other technologies: 
static pages, Apache modules, PHP and so on. We have to change this 
somehow, and I think that there is a solution at least for what is more 
important to users: perceived performance.

I had great results in the past using servlets and reverse proxies in 
front of them. This can be useful since it at least optimizes bandwith 
and network communications, thus resulting in a performance boost. But 
in order to improve the result I had to manually tweak the response 
headers.

Cocoon has a limited capability of doing so: the only place where I 
could find something was the HttpHeaderAction. From there I can do 
(almost) whatever I want but this also would mean that every pipeline 
entry would have to start with an action: difficult to maintain to say 
the least.

What I'm thinking about is a sort of mod_expires functionality clone: 
mod_expires (http://httpd.apache.org/docs/mod/mod_expires.html) has a 
simple syntax that allows to implement a flexible caching header 
handling. Basically what it can do is set the Expires headers based on 
modifiers acting on the access time (when the resource was requested on 
a one to one basis) or on the modification time (one to many approach). 
This is an example taken from the docs:

ExpiresByType text/html "access plus 1 month 15 days 2 hours"
ExpiresByType image/gif "modification plus 5 hours 3 minutes"

This is incredibly powerful when applied to real life scenarios, and I'd 
really love to see this feature in Cocoon. But what are the best 
semantics for it? The first thing that comes to my mind is that the 
directive should be an attribute, not an element. If we also consider 
that it might be hard to get and calculate a modification time for 
resources that make up a pipeline, I wouldn't bother about it and base 
all the system on the access time or on absolute time, using a plain 
syntax like "2h5m" or "25feb2002" for expressing the expires validity 
together with some keywords like "never" or "always".

Now let's think about the placement of such element: I'll start by 
stating that such directives can be thought as being at the same level 
as "handle-errors" is, so they might appear in pipeline declaration.

Let's assume that we have three kind of resources:

1. static files that basically don't change:
<match pattern="static/**">
   <read src="static-files/{1}"/>
</match>

2. dynamic built resources that last for a while and change at a fixed date:
<match pattern="catalog/**/*.html">
   <generate src="catalog/{1}/{2}.xsp"/>
   <transform src="stylesheets/catalog.xsl" />
   <serialize/>
</match>

3. dynamic resources that change often:
<match pattern="press-releases/**/*.html">
   <generate src="xmldb:xindice///PR/{1}/{2}.xml"/>
   <transform src="stylesheets/articles.xsl" />
   <serialize/>
</match>
                        
4. dynamic resources that change everytime:
<match pattern="myportal/**.html">
   <aggregate>
     ...
   </aggregate>
   <transform src="stylesheets/articles.xsl" />
   <serialize/>
</match>


Now let's try to assemble them with two possible syntaxes:

1. different pipelines:
<!-- This one expires one year from now -->
<pipeline expires="1y">
   <match pattern="static/**">
     <read src="static-files/{1}"/>
   </match>
</pipeline>

<!-- This one expires at the end of the month. -->
<!-- Will need to be changed afterwards -->
<pipeline expires="31jan2002">
   <match pattern="catalog/**/*.html">
     <generate src="catalog/{1}/{2}.xsp"/>
     <transform src="stylesheets/catalog.xsl" />
     <serialize/>
   </match>
</pipeline>

<!-- This one expires 6hr after the first access -->
<pipeline expires="6h">
   <match pattern="press-releases/**/*.html">
     <generate src="xmldb:xindice///PR/{1}/{2}.xml"/>
     <transform src="stylesheets/articles.xsl" />
     <serialize/>
   </match>
</pipeline>

<!-- Finally, this one always expires -->
<pipeline expires="always">
   <match pattern="myportal/**.html">
     <aggregate>
       ...
     </aggregate>
     <transform src="stylesheets/articles.xsl" />
     <serialize/>
   </match>
</pipeline>

2. more granular: defined at the "pipeline" level but overridable:
<pipeline expires="6h">
   <match pattern="static/**" expires="1y">
     <read src="static-files/{1}"/>
   </match>

   <match pattern="catalog/**/*.html" expires="31jan2001">
     <generate src="catalog/{1}/{2}.xsp"/>
     <transform src="stylesheets/catalog.xsl" />
     <serialize/>
   </match>

   <match pattern="press-releases/**/*.html">
     <generate src="xmldb:xindice///PR/{1}/{2}.xml"/>
     <transform src="stylesheets/articles.xsl" />
     <serialize/>
   </match>
                        
   <match pattern="myportal/**.html" expires="always">
     <aggregate>
     ...
     </aggregate>
     <transform src="stylesheets/articles.xsl" />
     <serialize/>
   </match>
</pipeline>

I would say that syntax #1 is more consistent with the actual setup, but 
feedback is really appreciated.

Implementation should be pretty trivial: it would be just a matter of 
understanding the configuration and setting a couple of headers. Yet 
this would give us a tremendous performance boost, especially for 
self-contained webapps where we might put our resources and read them 
without worrying about performance issues: a reverse proxy will do all 
the dirty job for us.

I eagerly wait for your feedback.

Ciao,

-- 
Gianugo



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]

[RT]Cache/proxy friendly HTTP headers

Reply via email to