[ 
https://issues.apache.org/jira/browse/CASSANDRA-5357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13733151#comment-13733151
 ] 

Vijay commented on CASSANDRA-5357:
----------------------------------

Hi Jonathan, The idea in the current implementation is as follows:

The QueryCache<QueryFilter,CF> is implemented on top of SerializedCache. It 
stores the Map's key as a RowCacheKey<RowKey, CFID> (same as earlier RowCache), 
and Map's value is a composite value as QueryCacheValue<[Query, ....], 
ColumnFamily>, 

For every new query enters the system, we get the QueryCacheValue after 
generating RowCacheKey from QueryFilter, to check if the IFilter exist. If it 
does then return CF; else get QueryCacheValue (if QCV exist; else create new), 
add the IFilter to QCV and merge the results with the existing ColumnFamily 
(also in QCV), which will in-turn be serialized.

Advantages: 
1) Queries can overlap, there could be any number of queries but the data will 
not be repeated within them.
2) When we want to invalidate it we would just invalidate the RowKey and all 
the cached QueryCacheValue goes away (avoids another Map for book keeping and 
hence little more memory efficient)
3) there is a property which user can enable to cache the whole row no matter 
what the query is (but currently patch adds overhead of deserializing identity 
filter which can be fixed though).

Of course there are disadvantages: 
1) LRU algorithm is no longer really accurate, When a single query is hot we 
have no way of invalidating the other queries on the same row, since they all 
have the same number of hit rates (which is no worse than what we have 
currently)
2) With multiple types of queries on the same row (which is kind of edge case) 
we might be pulling the whole data into memory (which can be mitigated by 
incrementally loading it or holding a index in the filter and doesn't exist in 
the current patch).

there could be more which i overlooked...
                
> Query cache
> -----------
>
>                 Key: CASSANDRA-5357
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5357
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jonathan Ellis
>            Assignee: Vijay
>
> I think that most people expect the row cache to act like a query cache, 
> because that's a reasonable model.  Caching the entire partition is, in 
> retrospect, not really reasonable, so it's not surprising that it catches 
> people off guard, especially given the confusion we've inflicted on ourselves 
> as to what a "row" constitutes.
> I propose replacing it with a true query cache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to