[jira] [Commented] (CASSANDRA-8099) Refactor and modernize the storage engine

Robert Stupp (JIRA) Wed, 01 Apr 2015 03:43:13 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14390361#comment-14390361
 ]


Robert Stupp commented on CASSANDRA-8099:
-----------------------------------------

Just found some naming issues and some nits.

Altogether I have to say, that CASSANDRA-8099 is a great step forward! It 
simplifies a lot of areas in the code, explains a lot of things in javadoc and 
makes it a lot easier to follow the travelled code path using (mostly ;) ) 
appropriate nomenclature.

*NITs*

* {{org.apache.cassandra.db.ReadQuery}} can be an interface (only abstract 
methods) and is mentioned as an interface in the javadoc ;)
* {{org.apache.cassandra.config.CFMetaData#columnMetadata}} can be final
* {{org.apache.cassandra.config.CFMetaData#getDefaultIndexName}}, 
{{#isNameValid}}, {{#isIndexNameValid}} use non-compiled regexp

I didn’t create a pull-req since all findings above are just nits and could 
only make final rebasing harder.

*Nomenclature*

The name {{ColumnFilter}} is a bit misleading. From the first impression I 
thought it’s a filter that filters CQL columns - but it’s used to to a 2i 
lookup.

Can you rename {{NamesPartitionFilter}} to something with _clustering key_ in 
it? I know that the term _name_ is used elsewhere for clustering key.

{{CBuilder}}/{{MultiCBuilder}} could be more expressive as 
{{ClusteringBuilder}}/{{MultipleClusteringsBuilder}}

*Misc*

The first time I ran into a situation where {{cluster_name}} and {{host_id}} 
were null in {{system.local}}. But had no luck reproducing this (I’m sure I did 
a {{ant realclean jar}} and {{rm -rf data/*}} before). So just take this as a 
note - not something worth to discuss.

I did a quick&dirty prototype of CASSANDRA-7396 based on 8099 and it looks much 
easier (without the slicing stuff).

> Refactor and modernize the storage engine
> -----------------------------------------
>
>                 Key: CASSANDRA-8099
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8099
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>             Fix For: 3.0
>
>         Attachments: 8099-nit
>
>
> The current storage engine (which for this ticket I'll loosely define as "the 
> code implementing the read/write path") is suffering from old age. One of the 
> main problem is that the only structure it deals with is the cell, which 
> completely ignores the more high level CQL structure that groups cell into 
> (CQL) rows.
> This leads to many inefficiencies, like the fact that during a reads we have 
> to group cells multiple times (to count on replica, then to count on the 
> coordinator, then to produce the CQL resultset) because we forget about the 
> grouping right away each time (so lots of useless cell names comparisons in 
> particular). But outside inefficiencies, having to manually recreate the CQL 
> structure every time we need it for something is hindering new features and 
> makes the code more complex that it should be.
> Said storage engine also has tons of technical debt. To pick an example, the 
> fact that during range queries we update {{SliceQueryFilter.count}} is pretty 
> hacky and error prone. Or the overly complex ways {{AbstractQueryPager}} has 
> to go into to simply "remove the last query result".
> So I want to bite the bullet and modernize this storage engine. I propose to 
> do 2 main things:
> # Make the storage engine more aware of the CQL structure. In practice, 
> instead of having partitions be a simple iterable map of cells, it should be 
> an iterable list of row (each being itself composed of per-column cells, 
> though obviously not exactly the same kind of cell we have today).
> # Make the engine more iterative. What I mean here is that in the read path, 
> we end up reading all cells in memory (we put them in a ColumnFamily object), 
> but there is really no reason to. If instead we were working with iterators 
> all the way through, we could get to a point where we're basically 
> transferring data from disk to the network, and we should be able to reduce 
> GC substantially.
> Please note that such refactor should provide some performance improvements 
> right off the bat but it's not it's primary goal either. It's primary goal is 
> to simplify the storage engine and adds abstraction that are better suited to 
> further optimizations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8099) Refactor and modernize the storage engine

Reply via email to