[
https://issues.apache.org/jira/browse/CASSANDRA-12373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16137023#comment-16137023
]
Sylvain Lebresne commented on CASSANDRA-12373:
----------------------------------------------
The general approach lgtm, but I think there is problems around dealing with
'dense' versus 'non-dense'.
Those things are a bit complicated however, and frankly under-documented, so
allow me first to remind what 'dense' and 'non dense' mean for super column
families (SCF in what follows), to make sure we're on the same page.
The first important thing to note is that contrarily to other {{COMPACT
STORAGE}} tables, the value of the {{is_dense}} flag doesn't impact the
internal layout of a SCF, neither in 2.x nor in 3.x. What it does impact
however (in 2.x so far at least) is how the SCF is exposed through CQL, and
that's what we're trying to make work in this ticket (and it must work in the
same way than in 2.x) .
So the definition of being "dense" for a SCF in 2.x is that the user hasn't
added any column in the thrift {{column_metadata}} of that SCF. Which
equivalently means that {{is_dense == true}} if a SCF has no {{REGULAR}}
columns internally.
With that defined, the impact on CQL is the following:
* a "dense" SCF having no {{REGULAR}} column, CQL exposes each "thrift column"
of the SCF as an individual CQL row. So if you take a dense SCF containing
something the following (using imaged representation of a thrift SCF here,
hopefully it's clear what I mean):
{noformat}
'k' : {
'sc1' : {
'a' : 1,
'b' : 2,
'c' : 2,
},
'sc2': {
'b' : '3'
}
}
{noformat}
then this is exposed in CQL as:
{noformat}
key | column1 | column2 | value
----+---------+---------+-------
'k' | 'sc1' | 'a' | 1
'k' | 'sc1' | 'b' | 2
'k' | 'sc1' | 'c' | 2
'k' | 'sc2' | 'b' | 3
{noformat}
* a "non dense" SCF however only exposes through CQL the values of "thrift
columns" that belong to a defined thrift {{column_metadata}}. So considering
the same SCF example, but saying that SCF is now non dense because the user has
defined {{column_metadata=[{column_name: b, validation_class: UTF8Type}]}},
then that SCF will be exposed in CQL as
{noformat}
key | column1 | b
----+---------+---
'k' | 'sc1' | 2
'k' | 'sc2' | 3
{noformat}
Note in particular that any value not associated to a non-declared
{{column_metadata}} is simply not exposed: it's there internally, but not
accessible through CQL.
I'll note at this point that the "why" this is done this way is unimportant
here, we are way way past changing any of this. This is how things work in 2.x
however and the only goal here is to make it work the same way in 3.x so user
can upgrade without problems.
Anyway, back to the patch, I think there is 2 problems:
# the methods in {{SuperColumnCompatibility}} don't seem to handle non-dense
super columns properly. Typically, I haven't actually tested, but from reading
the code I believe that for my example above (using the non dense case where
'b' is the only defined column) would yield something like:
{noformat}
key | column1 | b
----+---------+---
'k' | 'sc1' | 2
'k' | 'sc1' | 2
{noformat}
which is, well, not what 2.x does.
# I believe 3.x hasn't been setting the {{is_dense}} flag so far (which went
unnoticed because, as said above, that flag only influence CQL in the case of
SCF, and that hasn't working in 3.x so far). More precisely, I believe all SCF
currently have their {{is_dense}} flag set to {{false}} in 3.x. And while the
attached patch correctly fixes this issue _for upgrades from 2.x_, it's too
late for cluster already upgraded to 3.x. And that's unfortunate because if I'm
not mistaken, the schema migration process from 3.x has unintentionally erased
the information we need to correct that problem. More precisely, the way to
recognize a dense SCF in 2.1 is that is has no {{REGULAR}} column definitions
internally, but dense SCF in 2.1 had a {{COMPACT_VALUE}} column definition, and
in the 3.x schema migration code, we have converted that to {{REGULAR}}. In
other words, if on 3.x we have a SCF with a single {{REGULAR}} column
definition, we cannot really know with certainty if it's a dense SCF whose
{{COMPACT_VALUE}} has been converted to a {{REGULAR}}, or a genuinely non-dense
SCF with a single user-declared definition. I can't currently think of a good
solution to that problem. We can play some guessing game using a few
assumptions and hope no user will break those assumptions but I wanted to open
up the discussion on that problem before delving into bad solutions in case
someone has an actually good solution.
Other than that, I only have a few very minor remarks:
* I believe the last line of the class javadoc ("On write path, ...") of
{{SuperColumnCompatibility}} has been truncated.
* Hardcoding "column2" and "value" in {{SuperColumnCompatibility}} could
theoretically clash with some use defined column names. There is code in
{{CompactTables}} that's meant to deal with that and it probably make sense to
reuse that.
* Nit: Talking of {{CompactTables}}, there is a few super columns related
stuffs, maybe it's worth moving some of that (possibly including the part of
the class javadoc that explains SuperColumn) to {{SuperColumnCompatibility}} so
it's easier to locate and most super column stuffs are in one place.
* Nit: I'd probably merge {{getSuperCfKeyColumn()}} with
{{makeSuperCfKeyColumn()}} and {{getSuperCfValueColumn()}} with
{{makeSuperCfValueColumn()}}, as least as far as the "public" API of
{{SuperColumnFamily}} is concerned (really just to keep things as encapsulated
as possible).
* Nit: Patch adds a few unused imports in {{SelectStatement}}.
> 3.0 breaks CQL compatibility with super columns families
> --------------------------------------------------------
>
> Key: CASSANDRA-12373
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12373
> Project: Cassandra
> Issue Type: Bug
> Components: CQL
> Reporter: Sylvain Lebresne
> Assignee: Alex Petrov
> Fix For: 3.0.x, 3.11.x
>
>
> This is a follow-up to CASSANDRA-12335 to fix the CQL side of super column
> compatibility.
> The details and a proposed solution can be found in the comments of
> CASSANDRA-12335 but the crux of the issue is that super column famillies show
> up differently in CQL in 3.0.x/3.x compared to 2.x, hence breaking backward
> compatibilty.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]