RE: Cassandra on RocksDB experiment result

Bob Dourandish Wed, 19 Apr 2017 13:41:14 -0700

There is probably something I missed the message below.

In my experience with pluggable storage engines (in the MySQL world), the 
engine manages all storage that it "owns." The higher tiers in the architecture 
don't need to get involved unless multiple storage engines have to deal with 
compaction (or similar) issues over the entire database, e.g., every storage 
engine has read/write access to every piece of data, even if that data is owned 
by another storage engine.

I don't know enough about Cassandra internals to have an opinion as to whether 
or not the above scenario makes sense in the Cassandra context. But "sharing" 
(processes or data) between storage engines gets pretty hairy, easily deadlocky 
(!), even in something as relatively straightforward as MySQL. 

So.... this could be a way cool project and I'd love to get involved if it gets 
off the ground.

Bob

-----Original Message-----
From: DuyHai Doan [mailto:doanduy...@gmail.com] 
Sent: Wednesday, April 19, 2017 3:33 PM
To: dev@cassandra.apache.org
Subject: Re: Cassandra on RocksDB experiment result

"I have no clue what it would take to accomplish a pluggable storage engine, 
but I love this idea."

This was a long and old debate we had several times in the past. One of the 
difficulty of pluggable storage engine is that we need to manage the 
differences between the LSMT of native C* and RockDB engine for compaction, 
repair, streaming etc...

Right now all the compaction strategies share the assumption that the data 
structure and layout on disk is fixed. With pluggable storage engine, we need 
to special case each compaction strategy (or at least the Abstract class of 
compaction strategy) for each engine.

The current approach is one storage engine, many compaction strategies for 
different use-cases (TWCS for time series, LCS for heavy update...).

With pluggable storage engine, we'll have a matrix of storage engine x 
compaction strategies.

And not even mentioning the other operations to handle like streaming and 
repair.

Another question that arose is: will the storage engine be run in the same JVM 
as the C* server or is it a separate process ? For the later, we're opening the 
door to yet-another-distributed-system complexity. For instance, how the C* JVM 
will communicate with the storage engine process ?
How to handle failure, crash, resume etc ...

That being said, if we manage to get the code base to this stage eventually 
it'd be super cool !

On Wed, Apr 19, 2017 at 12:03 PM, Salih Gedik <m...@salih.xyz> wrote:

> Hi Dikang,
>
> I guess there is something wrong with the link that you shared.
>
>
> 19.04.2017 19:21 tarihinde Dikang Gu yazdı:
>
> Hi Cassandra developers,
>>
>> This is Dikang from Instagram, I'd like to share you some experiment 
>> results we did recently, to use RocksDB as Cassandra's storage 
>> engine. In the experiment, I built a prototype to integrate Cassandra 
>> 3.0.12 and RocksDB on single column (key-value) use case, shadowed 
>> one of our production use case, and saw about 4-6X P99 read latency 
>> drop during peak time, compared to 3.0.12. Also, the P99 latency 
>> became more predictable as well.
>>
>> Here is detailed note with more metrics:
>>
>> https://docs.google.com/document/d/1Ztqcu8Jzh4USKoWBgDJQw82DBurQm
>> sV-PmfiJYvu_Dc/edit?usp=sharing
>>
>> Please take a look and let me know your thoughts. I think the biggest 
>> latency win comes from we get rid of most Java garbages created by 
>> current read/write path and compactions, which reduces the JVM 
>> overhead and makes the latency to be more predictable.
>>
>> We are very excited about the potential performance gain. As the next 
>> step, I propose to make the Cassandra storage engine to be pluggable 
>> (like Mysql and MongoDB), and we are very interested in providing 
>> RocksDB as one storage option with more predictable performance, 
>> together with community.
>>
>> Thanks.
>>
>>
>

RE: Cassandra on RocksDB experiment result

Reply via email to