Re: [c3] Conditional GET

Sylvain Wallez Thu, 10 Dec 2009 06:10:06 -0800

Steven Dolg wrote:

Sylvain Wallez schrieb:
Reinhard Pötz wrote:
<snip/>
But let me broaden the picture: Based on our work from about two weeks
ago, I created another aspect which implements the support for
conditional GET requests and also takes care that a pipeline isn't
executed unless it is really necessary. I was also able to fix all
failing test cases. I created an issue that contains a patch:
https://issues.apache.org/jira/browse/COCOON3-47
Additionally there is also another feature that I would like to add:The
current patch only takes care of 'If-Modified-Since' requests. I also
want to support 'If-None-Match' requests that are based on the 'ETag'
response header. (see http://en.wikipedia.org/wiki/HTTP_ETag).

Using ETag has the advantage that we could support conditional GET
requests also in the case where we can't use a timestamp based approach
 (e.g. when using o.a.c.pipeline.caching.ParameterCacheKey) or to
provide conditional GET support in REST controllers.

As an ETag value we could use the hash code of a pipeline's cache key.
I don't fully get the context of this conversation, but this lastsentence triggered a question to me: how can we validate a cacheentry with its _key_? Looking at the code, I see that CacheKey holdsboth the identifier information (the actual key) and the validityinformation.
There is a naming issue here which leads to some confusion betweenkey and key-and-validity that we can see it in the code:ExpiresCacheKey doesn't include the validity information inhashcode() and equals() whereas ParameterCacheKey does. What is theright contract?
I'm not sure I understand what you mean.
The implementations of hashCode() in ExpiresCacheKey andParameterCacheKey are as similar in both the code and actual behaviouras they can be.Neither of them performs any operations necessary to check theirvalidity in the hashCode() or equals() methods.

Hmm... ok, so parameter values are part of the key. When reading thecode, and because of this mixing of validity and key (and lack of docs)I thought the parameter keys were defining the identity and their valueswere defining the validity.

Your confusion might arrise from the point that ParameterCacheKeycannot become invalid because the same parameter value means alwaysthe same parameter value, there is no way this can become invalid (asopposed to a cache file contents which can become invalid when thefile is changed, even if it is still the same file)So the isValid() method basically performs the equals check, sincethis is a required condition for being valid. (valid = equal & notexpired; since expired = false here: valid = equal).

Ok. So with the definition of parameter keys and values being part ofthe identity, ExpiresCacheKey and ParameterCacheKey effectively behaveconsistenly.

Now you'll understand that this becomes really confusing, and if there'snot a very well defined contract for equals and hashcode (and even withthat) there's a big opportunity for people to implement wrongly theirCacheKey.

The ExpiresCacheKey performs an additional check in its isValid()method, namely checking the expiresTimestamp.This is not done in either the hashcode() or equals() method. So herevalid := equals & not expired.
This principle holds true for each and every CacheKey currentlyimplemented (unless there is faulty implementation).
And this is also the answer to your question:
CacheKey contains information to check its validity, but thisinformation is not used for identifying (iow, equals() and hashCode()methods) CacheKeys.Which means frequently invalidated CacheKeys will not fill the cachebut instead overwrite each other.

This is only true for cache implementations that rely on hashcode andequals(), i.e. that keep an index in a Map or a Set (ehcache does thisfor its DiskStore).

But if you use a non-java cache or persistent store, you have no othersolution than serializing the key and its validity information. This isfor example the case with memcache which requires the key to be aString. And this is where the problem arises if you don't want or can'tkeep an in-memory index, e.g. because of size or distribution.

As a side note, both classes include the class' hashcode in theinstance's hash code, which means hash codes will be different aevery JVM restart, or across JVM instances in a cluster, and islikely to break persistent and distributed caches.
That is a good hint.
We will want to look into that and amend things if necessary.

Well, I would call it a bug that needs fixing, because it's basicallyequivalent to clearing persistent caches at every JVM restart or classreloading.

That being said, I'm wondering if this aggregation of key andvalidity won't cause other kinds of problems with distributed cacheimplementations. For example, Java memcached clients serialize thecache key and use this result as the memcache key. If the keyincludes validity information, the memcache key will change everytime the underlying data changes (e.g. a file's timestamp).
At first sight, this can sound good as it means we will have a cachemiss when the validity has changed, and will even avoid having tocompare the validity of cached content. But this can have adesastrous impact on the cache efficiency in situations where youhave some often requested content that changes frequently: the cachewill quickly fill up with obsolete versions of this content underdifferent key values, that will lead older content to be evicted,reducing the overall cache efficiency. Whereas a key that's only anindentifier will lead the entry to be _replaced_ and not a new onebeing added.
So in the end, my feeling is that key and validity information reallyshould be separated.
Now going back to the ETag discussion, using the pipeline's cache keywon't work IMHO because of the implementation of some key'shashcode() using only the identifier part of the key and not thevalidity. Confusion, I told you ;-)
We (intend to) use a layer for integrating caches since we don't wantto compile directly against the API of one specific provider and thenhave to stick with that provider till the end of time (Avalon, anyone?)

Avalon certainly had its problems, but never mandated a particular cacheAPI beyond the one that we, Cocoon devs, defined in Excalibur(org.apache.excalibur.store.Store). But even the Excalibur Store wasn'ta requirement, since it was just the store abstraction used by aparticular implementation of org.apache.cocoon.caching.Cache.

This additional layer is used to perform validity checks whennecessary and/or desired and not check the validity if not.The intention here is to reuse the abstraction layer and not have thiskind of (critical) logic scattered in the individual cache provideradaptors.

Agree. A cache storage should not have to do much more than get(key),put(key, value) and maybe delete(key) and clear().

So it is possible to check if a CacheKey is pointing to the sameresource *and* if that cached data is still valid - even tho theunderlying cache provider has no means of performing the second check(validity).

I totally agree. Now this doesn't solve the issue: to implement put(key,value) on an arbitrary non-java store, I wouldn't trust the CacheKey'shashcode() method to produce a uniform distribution that would avoidconflicts (same hashcode for different CacheKeys). The solution would beto to serialize the CacheKey and either use this as the store key ifit's not too long, or use a strong hash (e.g. MD5 or FNV) of thisserialized representation otherwise.

Mixing key and validity in a single CacheKey object means having severalstore keys (which is different from _cache_ keys in this case) forCacheKeys that are equal(), leading to the problems I outlined.

And BTW, what is the "jmxGroupName" property on CacheKey used for?


The jmxGroupName is used for making them accessible via JMX, no?

Well, I guessed this was somehow related to JMX ;-) Did my homework andfound its use in cocoon-monitoring's CacheEntrysMonitorInitializer. Nicestuff, but I'm wondering if exposing the full key set of a big cache toJMX actually scales.

And there are some cache implementations (again, memcached) that don'texpose their key set. But since this is mostly used for monitoringAFAIU, returning an empty set in these cases should be acceptable.


Sylvain

--
Sylvain Wallez - http://bluxte.net

Re: [c3] Conditional GET

Reply via email to