Dan Kinder created CASSANDRA-14229:

             Summary: Separate data drive for smaller SSTable files
                 Key: CASSANDRA-14229
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14229
             Project: Cassandra
          Issue Type: New Feature
          Components: Local Write-Read Paths
            Reporter: Dan Kinder

For datasets with an active set of keys that well exceeds ram, it would be 
quite useful to be able to put certain sstable files (e.g. *-Index.db) on a 
separate, faster drive(s) than the data. E.g. put the indexes on SSD and data 
on HDD. Particularly valuable when keys are much smaller than values. Also as 
ram continues to get more expensive, users that currently optimize by having 
large key caches may not need to buy as much of it.

Our use case is a large dataset like this one. Storing all the data on SSD is 
cost-prohibitive, and the reads are extremely random (effectively every key is 
in the active set), so we don't have enough ram to cache it. (I did try using a 
massive key cache, 64GB, and was seeing strange behavior anyway... irqbalancer 
process pegged the cpu and the whole thing way underperformed. An investigation 
for another day.)

At the moment our only resolution is to buy enough HDD to handle 2 seeks per 
read, 1 for the index and 1 for the data. But having indexes on SSD would speed 
this way up, and practically require us to purchase a small number of SSDs and 
about 1/2 the number of HDD.

One user suggested lvmcache, which could work. I'd like to hear if this will 
really work optimally and if lvmcache will really keep the right blocks on the 
faster volume, and how reliable it is at the task.

Note: asked about this on the mailing list and it was suggested I create a JIRA.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to