Re: [c3] Conditional GET

Steven Dolg Thu, 10 Dec 2009 02:41:27 -0800

Sylvain Wallez schrieb:

Reinhard Pötz wrote:
<snip/>
But let me broaden the picture: Based on our work from about two weeks
ago, I created another aspect which implements the support for
conditional GET requests and also takes care that a pipeline isn't
executed unless it is really necessary. I was also able to fix all
failing test cases. I created an issue that contains a patch:
https://issues.apache.org/jira/browse/COCOON3-47

Additionally there is also another feature that I would like to add: The
current patch only takes care of 'If-Modified-Since' requests. I also
want to support 'If-None-Match' requests that are based on the 'ETag'
response header. (see http://en.wikipedia.org/wiki/HTTP_ETag).

Using ETag has the advantage that we could support conditional GET
requests also in the case where we can't use a timestamp based approach
 (e.g. when using o.a.c.pipeline.caching.ParameterCacheKey) or to
provide conditional GET support in REST controllers.

As an ETag value we could use the hash code of a pipeline's cache key.
I don't fully get the context of this conversation, but this lastsentence triggered a question to me: how can we validate a cache entrywith its _key_? Looking at the code, I see that CacheKey holds boththe identifier information (the actual key) and the validity information.
There is a naming issue here which leads to some confusion between keyand key-and-validity that we can see it in the code: ExpiresCacheKeydoesn't include the validity information in hashcode() and equals()whereas ParameterCacheKey does. What is the right contract?


I'm not sure I understand what you mean.

The implementations of hashCode() in ExpiresCacheKey andParameterCacheKey are as similar in both the code and actual behaviouras they can be.Neither of them performs any operations necessary to check theirvalidity in the hashCode() or equals() methods.

Your confusion might arrise from the point that ParameterCacheKey cannotbecome invalid because the same parameter value means always the sameparameter value, there is no way this can become invalid (as opposed toa cache file contents which can become invalid when the file is changed,even if it is still the same file)So the isValid() method basically performs the equals check, since thisis a required condition for being valid. (valid = equal & not expired;since expired = false here: valid = equal).

The ExpiresCacheKey performs an additional check in its isValid()method, namely checking the expiresTimestamp.This is not done in either the hashcode() or equals() method. So herevalid := equals & not expired.

This principle holds true for each and every CacheKey currentlyimplemented (unless there is faulty implementation).

And this is also the answer to your question:

CacheKey contains information to check its validity, but thisinformation is not used for identifying (iow, equals() and hashCode()methods) CacheKeys.Which means frequently invalidated CacheKeys will not fill the cache butinstead overwrite each other.

As a side note, both classes include the class' hashcode in theinstance's hash code, which means hash codes will be different a everyJVM restart, or across JVM instances in a cluster, and is likely tobreak persistent and distributed caches.


That is a good hint.
We will want to look into that and amend things if necessary.
Thanks

That being said, I'm wondering if this aggregation of key and validitywon't cause other kinds of problems with distributed cacheimplementations. For example, Java memcached clients serialize thecache key and use this result as the memcache key. If the key includesvalidity information, the memcache key will change every time theunderlying data changes (e.g. a file's timestamp).
At first sight, this can sound good as it means we will have a cachemiss when the validity has changed, and will even avoid having tocompare the validity of cached content. But this can have a desastrousimpact on the cache efficiency in situations where you have some oftenrequested content that changes frequently: the cache will quickly fillup with obsolete versions of this content under different key values,that will lead older content to be evicted, reducing the overall cacheefficiency. Whereas a key that's only an indentifier will lead theentry to be _replaced_ and not a new one being added.
So in the end, my feeling is that key and validity information reallyshould be separated.
Now going back to the ETag discussion, using the pipeline's cache keywon't work IMHO because of the implementation of some key's hashcode()using only the identifier part of the key and not the validity.Confusion, I told you ;-)

We (intend to) use a layer for integrating caches since we don't want tocompile directly against the API of one specific provider and then haveto stick with that provider till the end of time (Avalon, anyone?)

This additional layer is used to perform validity checks when necessaryand/or desired and not check the validity if not.The intention here is to reuse the abstraction layer and not have thiskind of (critical) logic scattered in the individual cache provideradaptors.

So it is possible to check if a CacheKey is pointing to the sameresource *and* if that cached data is still valid - even tho theunderlying cache provider has no means of performing the second check(validity).

For details you might want to look atorg.apache.cocoon.pipeline.caching.AbstractCache


And BTW, what is the "jmxGroupName" property on CacheKey used for?


The jmxGroupName is used for making them accessible via JMX, no?


Sylvain



Steven

Re: [c3] Conditional GET

Reply via email to