Re: Caching query results

Francesco Fuzio Tue, 26 Sep 2006 08:52:40 -0700

Hi Andrus,

first of all thank you for the prompt support and your suggestions.


As you correctly guessed I was talking about Cayenne 1.2.1

I was thinking about this automatic (but based on custom configuration)"invalidation algorithm":

Configure somewhere a logic association DataObject--->"QueryData"[] (exPaintings DataObject has to be associated to the queries [Select * fromPaintings where year = 1300 | Select * from Artists, Paintings wherePaintings.year >1200 | Select * from Artists, Paintings wherePaintings.year >1200 order by Paintings.name]

The "QueryData" object shoud contain, separately, information about the"expression" (i.e. the "where" part ) and about ordering."Select * from Artists, Paintings where Paintings.year >1200 order byPaintings.name" ---> { Paintings.year >1200 | order by Paintings.name }

If I modify (or create or delete) a DataObject I have to check the anteand post modification version of the single DataObject against theassociated QueryData'sWe could do this exploiting the Objects filtering capabilitiesExpression (or optionally using third parties utilities<commons-bean-utils??>) :


Expression filter = Expression.fromString("Paintings.year >1200");
filter.filterObjects(objects);

As a result we would have two Sets of Queries : those matching beforethe modification and those matching after the modification.For sure we have to invalidate all the query results that are not in theintersection of the two sets.


For the queries in the intersection:

a)If they have NO ordering (Order by clause , paging limitation etc)they are still valid

b)If they have ordering: if ordering is on one of the modifiedDataObject field, we have to invalidate the query result, otherwise thequery result is still valid.

Of course this solution can lead to high computational resources use,dependent on the number of queries it has to check.But, for example in the project I am collaborating to, the Db is the"under pressure"/bottleneck system and the Middleware has much less load.In such a situation "moving" load from the Db to the Mw is a benefit forthe Application as a whole.

For "basic" queries (I made some tests) I think the algorithm shouldwork. Of course more systematic test cases should be performed tocompletely validate the algorithm and/or find its limitation.Anyway I wanted to share it with you hoping it can be useful or can beof some "inspiration" for a proper/more correct solution.



Francesco.




Andrus Adamchik wrote:

Hi Francesco,


On Sep 25, 2006, at 10:56 AM, Francesco Fuzio wrote:
Thank you for the answers: I'm definitely looking forward to tryingthe 3.0 cool features you mentioned.
As for 2.1 (since for us is important to keep data updated withoutrelying on expiration timing) I was thinking about this approach (fora clustered environment)
That would be version 1.2.*, right?
1) Enable Cayenne Replicated Shared Object Cache
2) Disable Cayenne Query (i.e list ) Cache
3) Use a Caching framework supporting automatic distributedrefresh/invalidation policy (e.g Oscahe or Ehcache) to save queryresults as list of ObjectId's.4) In case of Query "Cache Hit" use the cached ObjectId's to retrievethe associated DataObjects via the DataContext [ public Persistent<http://incubator.apache.org/cayenne/1_2/api/cayenne/org/objectstyle/cayenne/Persistent.html>*localObject*(ObjectId<http://incubator.apache.org/cayenne/1_2/api/cayenne/org/objectstyle/cayenne/ObjectId.html>id, Persistent<http://incubator.apache.org/cayenne/1_2/api/cayenne/org/objectstyle/cayenne/Persistent.html>prototype)]
What do you think, is this approach reasonable? Will it work?
This should work (you'll just use your own cache as a front end to theDataContext query API), and should provide a clean path to the future3.0 migration. You'll need to consider a few things though:
A. Query cache key generation. In 1.2 this is based on Query namewhich is pretty dumb and barely usable; in 3.0 SelectQuery andSQLTemplate are smart enough to build the cache key based on theirstate. You may copy some of that code.
B. Invalidation Strategies. That's a tricky one....
I couldn't come up with a well-performing generic solution (I tried,see CAY-577). Consider that events that may cause automaticinvalidation are object deletion, insertion and updating (update canaffect the ordering and also whether an object still matches the querycondition). So *every* commit can potentially invalidate any number ofcached lists for a given entity.
The trick is to create an efficient algorithm to invalidate just theright cache entries and avoid invalidating the entire entity cache.Manually scanning and rearranging all lists on every commit is ofcourse very inefficient.
So in 3.0 we added "cache group" notion so that users could categorizequeries based on some criteria and then invalidate the whole categoryof cache entries. (Cache group notion is supported by OSCache by theway). Here is an example.... Consider a "BlogPost" entity. All queriesthat fetch a date range of BlogPosts can be arbitrarily divided into"old_posts" and "new_posts" categories. So once a userupdates/deletes/removes a BlogPost, a code can check the date of thispost and invalidate either "old_posts" or "new_posts".
This is just one solution that we came up with. Not automatic, butfairly simple and efficient. You can come up with your own strategies.If you can think of a better generic algorithm for invalidation,please share.
Andrus



__________ NOD32 1.1767 (20060921) Information __________

This message was checked by NOD32 antivirus system.
http://www.nod32.com

Re: Caching query results

Reply via email to