[jira] Commented: (CASSANDRA-1625) make the row cache continuously durable

Peter Schuller (JIRA) Sun, 17 Oct 2010 13:27:48 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-1625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921897#action_12921897
 ]


Peter Schuller commented on CASSANDRA-1625:
-------------------------------------------

Interesting, that does sound far less invasive then what I proposed, although 
some measure needs to be taken to ensure stale data (with respect to concurrent 
writing) is never written to that CF.

In general btw, an important observation is that while both my original and 
your proposal implies that reads can result in writes, the more effective the 
cache the less writes are required (assuming one does not preserve relative 
recency in the LRU one only needs to write on cache eviction to remove an entry 
and add a new one). Presumably if the cache is so important that one wants to 
ensure it is persistent on disk, the cache hit ratio is likely to be high, 
which means less writes per read.

In other words, the write overhead of the cache decreases the more efficient 
the cache is.

With respect to 1608 - my interpretation of that one is probably a bit weaker 
in the sense that I look at it as better prioritization when compaction happens 
as well as actively triggering compaction solely based on statistics from 
reads. While I can see how this will slowly over time tend to result in 
frequently accessed data being clustered in terms of sstables, I don't quite 
see how one would expect it to work so well as to make the pre-population not 
seek bound. I could see that being the case if 1608 implies keeping row-level 
statistics such that compactions could be triggered that separate hot data from 
cold data, I can see it making this obsolete.

I guess such a discussion is better kept in 1608.




> make the row cache continuously durable
> ---------------------------------------
>
>                 Key: CASSANDRA-1625
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1625
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Peter Schuller
>            Priority: Minor
>
> I was looking into how the row cache worked today and realized only row keys 
> were saved and later pre-populated on start-up.
> On the premise that row caches are typically used for small rows of which 
> there may be many, this is highly likely to be seek bound on large data sets 
> during pre-population.
> The pre-population could be made faster by increasing I/O queue depth (by 
> concurrency or by libaio as in 1576), but especially on large data sets the 
> performance would be nowhere near what could be achieved if a reasonably 
> sized file containing the actual rows were to be read in a sequential fashion 
> on start.
> On the one hand, Cassandra's design means that this should be possible to do 
> efficiently much easier than in some other cases, but on the other hand it is 
> still not entirely trivial.
> The key problem with maintaining a continuously durable cache is that one 
> must never read stale data on start-up. Stale could mean either data that was 
> later deleted, or an old version of data that was updated.
> In the case of Cassandra, this means that any cache restored on start-up must 
> be up-to-date with whatever position in the commit log that commit log 
> recovery will start at. (Because the row cache is for an entire row, we can't 
> couple updating of an on-disk row cache with memtable flushes.)
> I can see two main approaches:
> (a) Periodically dump the entire row cache, deferring commit log eviction in 
> synchronization with said dumping.
> (b) Keep a change log of sorts, similar to the commit log but filtered to 
> only contain data written to the commit log that affects keys that were in 
> the row cache at the time. Eviction of commit logs or updating positional 
> markers that affect the point of commit log recovery start, would imply 
> fsync():ing this change log. An incremental traversal, or alternatively a 
> periodic full dump, would have to be used to ensure that old row change log 
> segments can be evicted without loss of cache warmness.
> I like (b), but it is also the introduction of significant complexity (and 
> potential write path overhead) for the purpose of the row cache. In the worst 
> case where hotly read data is also hotly written, the overhead could be 
> particularly significant.
> I am not convinced whether this is a good idea for Cassandra, but I have a 
> use-case where a similar cache might have to be written in the application to 
> achieve the desired effect (pre-population being too slow for a sufficiently 
> large row cache). But there are reasons why, in an ideal world, having such a 
> continuously durable cache in Cassandra would be much better than something 
> at the application level. The primary reason is that it does not interact 
> poorly with consistency in the cluster, since the cache is node-local and 
> appropriate measures would be taken to make it consistent locally on each 
> node. I.e., it would be entirely transparent to the application.
> Thoughts? Like/dislike/too complex/not worth it?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-1625) make the row cache continuously durable

Reply via email to