Re: Caching Support Request Filter

Ian Boston Wed, 28 Apr 2010 16:52:35 -0700

On 29 Apr 2010, at 01:23, Felix Meschberger wrote:

> Hi,
> 
> Wow !
> 
> I didn't expect to have this discussion get in this direction, but excellent !
> 
> For illustration what I originally had in mind, I have commited my
> prototype in [1].
> 
> Please note, that this *only* is about setting the Last-Modified and
> Cache-Control headers.
> 
> Now, taken a step further: do we really want to build a cache into
> Sling ? Shouldn't we rather rely on some existing caching proxy for
> this, like Squid or mod_cache/mod_proxy ?
> 
> As for what to cache (if we cache): I think we should not cache
> requests with Queries, such requests are by definition not cacheable.
> Having a multi-dimensional cache taking requesting users into account
> is also an interesting thing.


It *should* be possible to specify which responses can be cached, with a 
request attribute being set by the thing creating the response, caching off by 
default.
The key to the cache should be based on a subset of request headers (including 
cookies, since that will contain user entropy).
The key needs to be a multi level key so that all instances of a cached 
response can be invalidated to all users in 1 operation.

Only if those criteria are met should the headers and byte array representing 
the response be cached (I have been using ehcache, which can be configuration 
for putting the cache down to disk).

It then becomes the responsibility of the thing creating the response to 
invalidate the cache for items that it "knows" about.

 


> 
> My fear is, that we run into a performance drain just to manage the cache ....

Agreed, imho, caching should be selective and at the discretion of the 
application rather than a blanket operation. Certainly in Sakai Nakamura we 
have many Sling Servlets where the invalidation is complex and not something 
that could be automated.

> 
> Regards
> Felix
> 
> [1] http://svn.apache.org/repos/asf/sling/whiteboard/fmeschbe/cachecontrol
> 
> 
> On Wed, Apr 28, 2010 at 4:02 PM, Eric Norman <[email protected]> wrote:
>> Hi all,
>> 
>> In general, I like the idea of a server side cache.  However, I agree with
>> Vidar that a cache without resource tracking has limited usefulness in a
>> real system.
>> 
>> In the past I had implemented something similar.
>> 
>> The key parts I remember were:
>> 
>>   - I used a (slightly) modified version of the OSCache library for
>>   managing the cache: http://www.opensymphony.com/oscache/
>>   - Cache only for GET requests
>>   - The cacheKey had to contain (at a minimum) the following information:
>>      1. Is the current user logged in? (anonymous vs. real user)
>>      2. What groups is the current user a member of (in case ACLs affect
>>      what is rendered).  Also, the ACEs for all the resources used to
>> render the
>>      response would need to use group principals instead of
>> individual userids to
>>      make the cache value reusable by more users.
>>      3. The current theme, language, or other options from the user
>>      preferences that may affect how the page is rendered.
>>      4. A version of the requested query string that has been sorted (in
>>      case the params come in a different order).
>>      5. Filter out "jsessionid" if it is present on the url
>>   - When rendering the page keep track of all the resources used to render
>>   the page.  Using the OSCache APIs, the resources were tracked by adding the
>>   resource path as a 'group' on the cache entry.
>>   - Special handling is need for cache invalidation during ACL changes in
>>   case changing the ACL causes the content of the page to change.
>>   - Sometimes tracking resources used is not sufficient as you may have a
>>   page that is listing the children of a container.  Adding a new child to 
>> the
>>   container would also need to invalidate the cache entry.  To handle this,
>>   pages that do such things would need to add a container 'group' to the 
>> cache
>>   entry (cacheEntry.addGroup(container:[resourcePath]).
>>   - Use a (Synchronous) JCR Observer to listen for changes to resources.
>>    If a change is detected, invalidate any cache entries that reference the
>>   changed resource (or entries that track the parent container). In OSCache
>>   this is done by flushing the group (the resource path) to invalidate any
>>   entries that reference the group path
>>   - During the rendering of the page there should be some way for the
>>   script to indicate that it should not be cached.
>>   - Sometimes caching the whole page is not possible if the page contains
>>   user specific text (for example, username in the page header) but it may be
>>   possible to cache fragments of the page instead.
>> 
>> 
>> Anyways, that's my 2 cents.
>> 
>> Regards,
>> Eric
>> 
>> On Wed, Apr 28, 2010 at 4:35 AM, Vidar Ramdal <[email protected]> wrote:
>> 
>>> On Wed, Apr 28, 2010 at 1:13 PM, Felix Meschberger
>>> <[email protected]> wrote:
>>>> Hi all,
>>>> 
>>>> I have been resonating with a collegue about a request level Filter
>>>> for Sling to support caching.
>>>> 
>>>> The idea (and partly implemented by a prototype) is to have the
>>>> request filter setup default caching behaviour of the response (if the
>>>> response is cacheable at, that is the request method must be GET and
>>>> there are no request parameters):
>>>> 
>>>> * The Cache-Control header is preset with values from configuration
>>>> matching the request URI (or resource path)
>>>> * The Last-Modified header is preset with the jcr:lastModified
>>>> property of the requet's resource
>>>> * Eager responding with 304/NOT MODIFIED if the If-Modified-Since
>>>> header is set and a last modification time of the resource can be
>>>> resolved.
>>> 
>>> The question is how useful such a filter would be if only the
>>> last-modified date of the requested resource is used.
>>> 
>>> In our application at least, there is a large number of resources
>>> involved when serving a request. Most CMSs list out menus, for
>>> example, where the menu items are other resources. If one of those
>>> resources have changed, or if there has been a new menu item created,
>>> it means the menu will be out of date if the requested resource itself
>>> is unmodified.
>>> 
>>> To solve this, we could introduce a resource tracker, which tracks
>>> which resources are being invoked on a request. The latest
>>> last-modified date of these resources will then be matched with the
>>> requests If-Modified-Since header.
>>> 
>>> --
>>> Vidar S. Ramdal <[email protected]> - http://www.idium.no
>>> Sommerrogata 13-15, N-0255 Oslo, Norway
>>> + 47 22 00 84 00 / +47 21 531941, ext 2070
>>> 
>>

Re: Caching Support Request Filter

Reply via email to