[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

Blake Eggleston (JIRA) Sat, 04 Nov 2017 17:14:51 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-13475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16239329#comment-16239329
 ]


Blake Eggleston commented on CASSANDRA-13475:
---------------------------------------------

Dikang and I spoke offline, and my proposed plan seems reasonable to him. 

So I think the next step would be to talk about the non technical side of this. 
The pluggable storage project’s place in Cassandra, and some general guidelines 
for how to approach the sub projects. Once we’ve converged on something in this 
jira, we should put it up on the dev list for a wider audience / additional 
feedback. My thoughts are below:

First, pluggable storage’s place in the Cassandra project:

For the time being, I think we should approach this as an effort to properly 
modularize storage related parts of Cassandra. The motivation being to enable 
experimentation with alternate storage ideas without having to resort to awful 
hacks, not ‘add pluggable storage to Cassandra’

I think this work could definitely lead to pluggable storage being a part of 
Cassandra at some point, and that it could be beneficial to users. However, I 
don’t think it’s a good idea to start with the intention of supporting, 
directly or indirectly, secondary storage layers. Both because of how it would 
impact development on core Cassandra, and also because of how it would affect 
user expectations about the storage options available to them.

Let’s start with making it possible, and then see where things go from there.

The short term implications for rocksdb would be that there may be api changes 
in minor releases they’d have to worry about, and they’ll still need a fork. 
The long term implications would be that pluggable storage may never really 
become an official part of Cassandra, so there’s risk in investing a lot of 
time in it.

Next, guidelines on approaching each incremental component.

Whenever we commit some code modularizing something, the overarching storage 
modularization project itself should remain abandon-able. In other words, if 
work stops on this project for some reason, there shouldn’t be any need to go 
back and revert any of the previous work.

Each component refactor, should, as much as possible, make sense on it’s own. 
Especially larger ones. Each project’s affect on internal decoupling and 
testability should be positive. We also can’t make core development work more 
difficult.

Finally, I’ve discussed this with Dikang offline, but just so no one’s 
surprised if I say this in the future: I don’t think a rocksdb backend makes 
sense for Cassandra. Cassandra is a sorted lsm, rocksdb is a sorted lsm, and we 
don’t need 2 of them. If we want rocksdb performance in Cassandra, it would 
probably take less time to close the gap by optimizing the existing engine than 
it will to do all this work to make storage pluggable. 

That said, I think this project is a good thing, and I’m happy to help. I think 
that the modularization work will be good for Cassandra. It enables a member of 
our dev community to try something new without committing the entire project to 
it, it will clean up some of our messy internals, and I think it will help us 
more quickly adapt to some of the changes in storage technology that are on the 
horizon.

> First version of pluggable storage engine API.
> ----------------------------------------------
>
>                 Key: CASSANDRA-13475
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13475
>             Project: Cassandra
>          Issue Type: Sub-task
>            Reporter: Dikang Gu
>            Assignee: Dikang Gu
>
> In order to support pluggable storage engine, we need to define a unified 
> interface/API, which can allow us to plug in different storage engines for 
> different requirements. 
> Here is a design quip we are currently working on:  
> https://quip.com/bhw5ABUCi3co
> In very high level, the storage engine interface should include APIs to:
> 1. Apply update into the engine.
> 2. Query data from the engine.
> 3. Stream data in/out to/from the engine.
> 4. Table operations, like create/drop/truncate a table, etc.
> 5. Various stats about the engine.
> I create this ticket to start the discussions about the interface.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-13475) First version of pluggable storage engine API.

Reply via email to