[ 
https://issues.apache.org/jira/browse/CASSANDRA-8099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14605893#comment-14605893
 ] 

Sylvain Lebresne commented on CASSANDRA-8099:
---------------------------------------------

I've force-pushed a rebased version of the branch (still 
[here|https://github.com/pcmanus/cassandra/tree/8099]).  Since my last update, 
on top of a number of fixes, I've finished moved the {{OpOrder}} out of the 
iterators close and I've update the range tombstone code to used specific 
boundaries marker as discussed above (I've also included Branamir's branch with 
it's "nits" and fixed most others). I haven't had the time to upgrade 
Branamir's test however and so for the sake of compilation I've currently 
removed it. If you could have a look at rebasing you test [~blambov], that 
would be very greatly appreciated as you're more familiar with it.

There is still a number of work to be done on this ticket, but the bulk of it 
is reasonably stable, and outside of some of the backward compatibility code 
the branch is generally functional. And we're starting to have tickets that are 
based on this and are ready (or almost are), tickets that won't be impacted too 
much by the remaining parts of this (which include the refactoring of the 
flyweight-based implementation that I'm going to focus on now, the wire 
backward compatibility code Tyler is working on and some general testing/bug 
fixing).

So, based on some offline discussion, I suggest committing the current branch 
to trunk. I won't close this ticket just yet and continue fixing the remaining 
things, but it'll allow other tickets to synchronize on this and will generally 
help get more eyes on this by necessity.

And I'm planning to commit this tomorrow-ish (my european tomorrow), so if you 
have a strong objection to this (again, we're not closing the ticket and 
committing it don't mean it can't change), please speak quickly.


> Refactor and modernize the storage engine
> -----------------------------------------
>
>                 Key: CASSANDRA-8099
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8099
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>             Fix For: 3.0 beta 1
>
>         Attachments: 8099-nit
>
>
> The current storage engine (which for this ticket I'll loosely define as "the 
> code implementing the read/write path") is suffering from old age. One of the 
> main problem is that the only structure it deals with is the cell, which 
> completely ignores the more high level CQL structure that groups cell into 
> (CQL) rows.
> This leads to many inefficiencies, like the fact that during a reads we have 
> to group cells multiple times (to count on replica, then to count on the 
> coordinator, then to produce the CQL resultset) because we forget about the 
> grouping right away each time (so lots of useless cell names comparisons in 
> particular). But outside inefficiencies, having to manually recreate the CQL 
> structure every time we need it for something is hindering new features and 
> makes the code more complex that it should be.
> Said storage engine also has tons of technical debt. To pick an example, the 
> fact that during range queries we update {{SliceQueryFilter.count}} is pretty 
> hacky and error prone. Or the overly complex ways {{AbstractQueryPager}} has 
> to go into to simply "remove the last query result".
> So I want to bite the bullet and modernize this storage engine. I propose to 
> do 2 main things:
> # Make the storage engine more aware of the CQL structure. In practice, 
> instead of having partitions be a simple iterable map of cells, it should be 
> an iterable list of row (each being itself composed of per-column cells, 
> though obviously not exactly the same kind of cell we have today).
> # Make the engine more iterative. What I mean here is that in the read path, 
> we end up reading all cells in memory (we put them in a ColumnFamily object), 
> but there is really no reason to. If instead we were working with iterators 
> all the way through, we could get to a point where we're basically 
> transferring data from disk to the network, and we should be able to reduce 
> GC substantially.
> Please note that such refactor should provide some performance improvements 
> right off the bat but it's not it's primary goal either. It's primary goal is 
> to simplify the storage engine and adds abstraction that are better suited to 
> further optimizations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to