[
https://issues.apache.org/jira/browse/CASSANDRA-3237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13538274#comment-13538274
]
Sylvain Lebresne commented on CASSANDRA-3237:
---------------------------------------------
Attached patches for this at
https://github.com/pcmanus/cassandra/commits/3237-1.
This ain't small so I'll try to explain the main idea here.
The main idea is that internally, super column families are handled for almost
all intents and purposes as if their comparator was a simple CompositeType with
2 components: the 1st one is the old super column name, the 2nd one the old
sub-column name. Meaning that they are largely not a special anymore and all
the super column specific code go away (including SuperColumn.java).
Now for compatibility sake, the main action is in the new SuperColumns.java
class. This class contains a bunch of static methods that:
* deserialize old super column format directly into new composite based CF.
* serialize new composite based CF to the old super column format
* convert 'super column query filters' to and from 'composite based query
filters'.
Then in ColumnFamilySerializer and the ReadCommand serializer, we use those
static methods when talking to old nodes (and a super column family is
involved). We also convert thrift SC queries into equivalent ones on the new
composite format in CassandraServer.java.
The patch also don't shy away from removing abstractions that are not necessary
anymore once super columns are removed. Most notably:
* QueryPath is removed. It was honestly already kind of useless with super
columns but even more so without them. It was also error-prone imho because
some method that were taking a QueryPath were actually ignoring everything
except the columnFamilyName for instance. I note that the class itself is not
removed but kept only to simplify wire compatibility with old nodes.
* IColumn and IColumnContainer are removed.
We could also merge ColumnFamily and AbstractColumnContainer but I've left that
to later.
As far as testing goes:
* the unit tests pass more or less. There's CassandraServerTest that timeout on
my box, but it does so on trunk too (seems to be the JVM that don't exit
properly). And there's also a few serializationTest failing but it seems to be
more related to the fact that the patch bumps the messaging version up that
anything else. I'll look at that later.
* our old functional tests (in test/system) pass. Again, there is a few
failure, but those are test that are assuming CollatingOrderedPartitioner
(apparently nobody ran those tests in a while). Anyway, those tests test the
thrift API for super columns fairly thorougly.
* you can now access super column family from CQL3.
* I've also (briefly) tested wire compatibily and that you can do super columns
queries in a mixed version cluster.
Regarding the CQL3 support, SCF for which column_metadata has been defined on
the subcolumn are handled almost like sparse CF. The almost is because I've
made sure we don't write row marker as in the case of sparse CF, cause that
would break backward compatibility (there is no way to have a column with an
empty name in a super column). For the same reason, collection are not
supported either.
One small downside that I need to note is that during upgrade from 1.2 to 2.0,
there might be a noticeable latency increase in super column queries. The
reason is that any read query that mix pre and post SC nodes will have a digest
mismatch (and so will re-query with the full data). Indeed, digest are not
versioned and cannot really be (not easily at least).
> refactor super column implmentation to use composite column names instead
> -------------------------------------------------------------------------
>
> Key: CASSANDRA-3237
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3237
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Matthew F. Dennis
> Priority: Minor
> Labels: ponies
> Fix For: 2.0
>
> Attachments: cassandra-supercolumn-irc.log
>
>
> super columns are annoying. composite columns offer a better API and
> performance. people should use composites over super columns. some people
> are already using super columns. C* should implement the super column API in
> terms of composites to reduce code, complexity and testing as well as
> increase performance.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira