make the row cache continuously durable
---------------------------------------

                 Key: CASSANDRA-1625
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1625
             Project: Cassandra
          Issue Type: Improvement
          Components: Core
            Reporter: Peter Schuller
            Priority: Minor


I was looking into how the row cache worked today and realized only row keys 
were saved and later pre-populated on start-up.

On the premise that row caches are typically used for small rows of which there 
may be many, this is highly likely to be seek bound on large data sets during 
pre-population.

The pre-population could be made faster by increasing I/O queue depth (by 
concurrency or by libaio as in 1576), but especially on large data sets the 
performance would be nowhere near what could be achieved if a reasonably sized 
file containing the actual rows were to be read in a sequential fashion on 
start.

On the one hand, Cassandra's design means that this should be possible to do 
efficiently much easier than in some other cases, but on the other hand it is 
still not entirely trivial.

The key problem with maintaining a continuously durable cache is that one must 
never read stale data on start-up. Stale could mean either data that was later 
deleted, or an old version of data that was updated.

In the case of Cassandra, this means that any cache restored on start-up must 
be up-to-date with whatever position in the commit log that commit log recovery 
will start at. (Because the row cache is for an entire row, we can't couple 
updating of an on-disk row cache with memtable flushes.)

I can see two main approaches:

(a) Periodically dump the entire row cache, deferring commit log eviction in 
synchronization with said dumping.

(b) Keep a change log of sorts, similar to the commit log but filtered to only 
contain data written to the commit log that affects keys that were in the row 
cache at the time. Eviction of commit logs or updating positional markers that 
affect the point of commit log recovery start, would imply fsync():ing this 
change log. An incremental traversal, or alternatively a periodic full dump, 
would have to be used to ensure that old row change log segments can be evicted 
without loss of cache warmness.

I like (b), but it is also the introduction of significant complexity (and 
potential write path overhead) for the purpose of the row cache. In the worst 
case where hotly read data is also hotly written, the overhead could be 
particularly significant.

I am not convinced whether this is a good idea for Cassandra, but I have a 
use-case where a similar cache might have to be written in the application to 
achieve the desired effect (pre-population being too slow for a sufficiently 
large row cache). But there are reasons why, in an ideal world, having such a 
continuously durable cache in Cassandra would be much better than something at 
the application level. The primary reason is that it does not interact poorly 
with consistency in the cluster, since the cache is node-local and appropriate 
measures would be taken to make it consistent locally on each node. I.e., it 
would be entirely transparent to the application.

Thoughts? Like/dislike/too complex/not worth it?


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to