[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

Benedict (JIRA) Tue, 15 Apr 2014 12:12:36 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969927#comment-13969927
 ]


Benedict commented on CASSANDRA-5863:
-------------------------------------

I think there are at least three issues we're contending with here, and each 
need their own ticket (eventually). Putting historic data on slow drives is, I 
think, a different problem to putting a cache on some fast disks. Both will be 
helpful. Ideally I think we want the following tiers:

# Uncompressed Memory Cache
# Compressed Memory Cache (disjoint set from 1)
# Compressed SSD cache
# Regular Data
# Archived/Cold/Historic Data

The main distinction being the added "regular data" layer: any special "fast 
disk" cache should not store the full sstable hierarchy and its related files, 
it should just store the most popular blocks (or portions of blocks)

bq. Benedict you are describing building a custom page cache impl off heap 
which is pretty ambitious. Don't you think a baby step would be to rely on the 
OS page cache to start and build a custom one as a phase II?

People get very worried when they think they're competing with the kernel 
developers. Often for good reason, but since we don't have to be all things to 
all people we get the opportunity to make economies that aren't always as 
easily available to them. But also we only need to get roughly the same 
performance so we can build on this to make inroads elsewhere. What we're 
talking about here is pretty straight forward - it's one of the less 
challenging problems. A compressed page cache is more challenging, since we 
don't have a uniform size, but it is still probably not too difficult. Take a 
look at my suggestion for a key cache in CASSANDRA-6709 for a detailed 
description of how I would build the offheap structure.

The basic approach I would probably take is this: deal with 4Kb blocks. Any 
blocks we read from disk larger than this we split up into 4Kb chunks and 
insert each into the cache separately*. The cache itself is 8- or 16-way 
associative, with 3 components: a long storing the LRU information for the 
bucket, 16-longs storing identity information for the lookup within the bucket, 
and corresponding positions in a large address space storing each of the 4Kb 
data chunks. Readers always hit the cache, and if they miss they populate the 
cache using the appropriate reader before continuing. Regrettably we don't have 
access to SIMD instructions or we could do a lot of this stuff tremendously 
efficiently, but even without that it should be pretty nippy.

*This allows us to have a greater granularity for eviction and keeps cpu-cache 
traffic when reading from the cache to a minimum. It's also a pretty optimal 
size for reading/writing to SSD if we overflow to disk, and is a sufficiently 
large amount to get good compression for an in-memory compressed cache, whilst 
still being small enough to stream&decompress from main-memory without a major 
penalty to lookup a small part of it.

As to having a fast disk cache, I also think this is a great idea. But I think 
it fits in as an extension of this and any compressed in-memory cache, as we 
build a tiered-cache architecture.

> In process (uncompressed) page cache
> ------------------------------------
>
>                 Key: CASSANDRA-5863
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: T Jake Luciani
>            Assignee: Pavel Yaskevich
>              Labels: performance
>             Fix For: 2.1 beta2
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a 
> byte[], sends it to ICompressor, gets back another byte[] and verifies a 
> checksum.  
> This process is where the majority of time is spent in a read request.  
> Before compression, we would have zero-copy of data and could respond 
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this 
> process for hot data. Initially this could be a off heap cache but it would 
> be great to put these decompressed chunks onto a SSD so the hot data lives on 
> a fast disk similar to https://github.com/facebook/flashcache.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-5863) In process (uncompressed) page cache

Reply via email to