[jira] [Commented] (CASSANDRA-3237) refactor super column implmentation to use composite column names instead

Sylvain Lebresne (JIRA) Fri, 21 Dec 2012 10:27:15 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13538274#comment-13538274
 ]


Sylvain Lebresne commented on CASSANDRA-3237:
---------------------------------------------

Attached patches for this at 
https://github.com/pcmanus/cassandra/commits/3237-1.

This ain't small so I'll try to explain the main idea here.

The main idea is that internally, super column families are handled for almost 
all intents and purposes as if their comparator was a simple CompositeType with 
2 components: the 1st one is the old super column name, the 2nd one the old 
sub-column name. Meaning that they are largely not a special anymore and all 
the super column specific code go away (including SuperColumn.java).

Now for compatibility sake, the main action is in the new SuperColumns.java 
class. This class contains a bunch of static methods that:
* deserialize old super column format directly into new composite based CF.
* serialize new composite based CF to the old super column format
* convert 'super column query filters' to and from 'composite based query 
filters'.

Then in ColumnFamilySerializer and the ReadCommand serializer, we use those 
static methods when talking to old nodes (and a super column family is 
involved). We also convert thrift SC queries into equivalent ones on the new 
composite format in CassandraServer.java.

The patch also don't shy away from removing abstractions that are not necessary 
anymore once super columns are removed. Most notably:
* QueryPath is removed. It was honestly already kind of useless with super 
columns but even more so without them. It was also error-prone imho because 
some method that were taking a QueryPath were actually ignoring everything 
except the columnFamilyName for instance. I note that the class itself is not 
removed but kept only to simplify wire compatibility with old nodes.
* IColumn and IColumnContainer are removed.

We could also merge ColumnFamily and AbstractColumnContainer but I've left that 
to later.

As far as testing goes:
* the unit tests pass more or less. There's CassandraServerTest that timeout on 
my box, but it does so on trunk too (seems to be the JVM that don't exit 
properly). And there's also a few serializationTest failing but it seems to be 
more related to the fact that the patch bumps the messaging version up that 
anything else. I'll look at that later.
* our old functional tests (in test/system) pass. Again, there is a few 
failure, but those are test that are assuming CollatingOrderedPartitioner 
(apparently nobody ran those tests in a while). Anyway, those tests test the 
thrift API for super columns fairly thorougly.
* you can now access super column family from CQL3.
* I've also (briefly) tested wire compatibily and that you can do super columns 
queries in a mixed version cluster.

Regarding the CQL3 support, SCF for which column_metadata has been defined on 
the subcolumn are handled almost like sparse CF. The almost is because I've 
made sure we don't write row marker as in the case of sparse CF, cause that 
would break backward compatibility (there is no way to have a column with an 
empty name in a super column). For the same reason, collection are not 
supported either.

One small downside that I need to note is that during upgrade from 1.2 to 2.0, 
there might be a noticeable latency increase in super column queries. The 
reason is that any read query that mix pre and post SC nodes will have a digest 
mismatch (and so will re-query with the full data). Indeed, digest are not 
versioned and cannot really be (not easily at least).
                
> refactor super column implmentation to use composite column names instead
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3237
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3237
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Matthew F. Dennis
>            Priority: Minor
>              Labels: ponies
>             Fix For: 2.0
>
>         Attachments: cassandra-supercolumn-irc.log
>
>
> super columns are annoying.  composite columns offer a better API and 
> performance.  people should use composites over super columns.  some people 
> are already using super columns.  C* should implement the super column API in 
> terms of composites to reduce code, complexity and testing as well as 
> increase performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3237) refactor super column implmentation to use composite column names instead

Reply via email to