[
https://issues.apache.org/jira/browse/CASSANDRA-5863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969927#comment-13969927
]
Benedict commented on CASSANDRA-5863:
-------------------------------------
I think there are at least three issues we're contending with here, and each
need their own ticket (eventually). Putting historic data on slow drives is, I
think, a different problem to putting a cache on some fast disks. Both will be
helpful. Ideally I think we want the following tiers:
# Uncompressed Memory Cache
# Compressed Memory Cache (disjoint set from 1)
# Compressed SSD cache
# Regular Data
# Archived/Cold/Historic Data
The main distinction being the added "regular data" layer: any special "fast
disk" cache should not store the full sstable hierarchy and its related files,
it should just store the most popular blocks (or portions of blocks)
bq. Benedict you are describing building a custom page cache impl off heap
which is pretty ambitious. Don't you think a baby step would be to rely on the
OS page cache to start and build a custom one as a phase II?
People get very worried when they think they're competing with the kernel
developers. Often for good reason, but since we don't have to be all things to
all people we get the opportunity to make economies that aren't always as
easily available to them. But also we only need to get roughly the same
performance so we can build on this to make inroads elsewhere. What we're
talking about here is pretty straight forward - it's one of the less
challenging problems. A compressed page cache is more challenging, since we
don't have a uniform size, but it is still probably not too difficult. Take a
look at my suggestion for a key cache in CASSANDRA-6709 for a detailed
description of how I would build the offheap structure.
The basic approach I would probably take is this: deal with 4Kb blocks. Any
blocks we read from disk larger than this we split up into 4Kb chunks and
insert each into the cache separately*. The cache itself is 8- or 16-way
associative, with 3 components: a long storing the LRU information for the
bucket, 16-longs storing identity information for the lookup within the bucket,
and corresponding positions in a large address space storing each of the 4Kb
data chunks. Readers always hit the cache, and if they miss they populate the
cache using the appropriate reader before continuing. Regrettably we don't have
access to SIMD instructions or we could do a lot of this stuff tremendously
efficiently, but even without that it should be pretty nippy.
*This allows us to have a greater granularity for eviction and keeps cpu-cache
traffic when reading from the cache to a minimum. It's also a pretty optimal
size for reading/writing to SSD if we overflow to disk, and is a sufficiently
large amount to get good compression for an in-memory compressed cache, whilst
still being small enough to stream&decompress from main-memory without a major
penalty to lookup a small part of it.
As to having a fast disk cache, I also think this is a great idea. But I think
it fits in as an extension of this and any compressed in-memory cache, as we
build a tiered-cache architecture.
> In process (uncompressed) page cache
> ------------------------------------
>
> Key: CASSANDRA-5863
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5863
> Project: Cassandra
> Issue Type: New Feature
> Components: Core
> Reporter: T Jake Luciani
> Assignee: Pavel Yaskevich
> Labels: performance
> Fix For: 2.1 beta2
>
>
> Currently, for every read, the CRAR reads each compressed chunk into a
> byte[], sends it to ICompressor, gets back another byte[] and verifies a
> checksum.
> This process is where the majority of time is spent in a read request.
> Before compression, we would have zero-copy of data and could respond
> directly from the page-cache.
> It would be useful to have some kind of Chunk cache that could speed up this
> process for hot data. Initially this could be a off heap cache but it would
> be great to put these decompressed chunks onto a SSD so the hot data lives on
> a fast disk similar to https://github.com/facebook/flashcache.
--
This message was sent by Atlassian JIRA
(v6.2#6252)