[jira] [Commented] (CASSANDRA-12373) 3.0 breaks CQL compatibility with super columns families

Sylvain Lebresne (JIRA) Tue, 22 Aug 2017 09:24:15 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-12373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16137023#comment-16137023
 ]


Sylvain Lebresne commented on CASSANDRA-12373:
----------------------------------------------

The general approach lgtm, but I think there is problems around dealing with 
'dense' versus 'non-dense'.

Those things are a bit complicated however, and frankly under-documented, so 
allow me first to remind what 'dense' and 'non dense' mean for super column 
families (SCF in what follows), to make sure we're on the same page.

The first important thing to note is that contrarily to other {{COMPACT 
STORAGE}} tables, the value of the {{is_dense}} flag doesn't impact the 
internal layout of a SCF, neither in 2.x nor in 3.x. What it does impact 
however (in 2.x so far at least) is how the SCF is exposed through CQL, and 
that's what we're trying to make work in this ticket (and it must work in the 
same way than in 2.x) .

So the definition of being "dense" for a SCF in 2.x is that the user hasn't 
added any column in the thrift {{column_metadata}} of that SCF. Which 
equivalently means that {{is_dense == true}} if a SCF has no {{REGULAR}} 
columns internally.

With that defined, the impact on CQL is the following:
* a "dense" SCF having no {{REGULAR}} column, CQL exposes each "thrift column" 
of the SCF as an individual CQL row. So if you take a dense SCF containing 
something the following (using imaged representation of a thrift SCF here, 
hopefully it's clear what I mean):
{noformat}
'k' : {
    'sc1' : {
        'a' : 1,
        'b' : 2,
        'c' : 2,
    },
    'sc2': {
        'b' : '3'
    }
}
{noformat}
then this is exposed in CQL as:
{noformat}
key | column1 | column2 | value
----+---------+---------+-------
'k' |   'sc1' |     'a' |     1
'k' |   'sc1' |     'b' |     2
'k' |   'sc1' |     'c' |     2
'k' |   'sc2' |     'b' |     3
{noformat}
* a "non dense" SCF however only exposes through CQL the values of "thrift 
columns" that belong to a defined thrift {{column_metadata}}. So considering 
the same SCF example, but saying that SCF is now non dense because the user has 
defined {{column_metadata=[{column_name: b, validation_class: UTF8Type}]}}, 
then that SCF will be exposed in CQL as
{noformat}
key | column1 | b
----+---------+---
'k' |   'sc1' | 2
'k' |   'sc2' | 3
{noformat}
Note in particular that any value not associated to a non-declared 
{{column_metadata}} is simply not exposed: it's there internally, but not 
accessible through CQL.

I'll note at this point that the "why" this is done this way is unimportant 
here, we are way way past changing any of this. This is how things work in 2.x 
however and the only goal here is to make it work the same way in 3.x so user 
can upgrade without problems.


Anyway, back to the patch, I think there is 2 problems:
# the methods in {{SuperColumnCompatibility}} don't seem to handle non-dense 
super columns properly. Typically, I haven't actually tested, but from reading 
the code I believe that for my example above (using the non dense case where 
'b' is the only defined column) would yield something like:
{noformat}
key | column1 | b
----+---------+---
'k' |   'sc1' | 2
'k' |   'sc1' | 2
{noformat}
which is, well, not what 2.x does.
# I believe 3.x hasn't been setting the {{is_dense}} flag so far (which went 
unnoticed because, as said above, that flag only influence CQL in the case of 
SCF, and that hasn't working in 3.x so far). More precisely, I believe all SCF 
currently have their {{is_dense}} flag set to {{false}} in 3.x. And while the 
attached patch correctly fixes this issue _for upgrades from 2.x_, it's too 
late for cluster already upgraded to 3.x. And that's unfortunate because if I'm 
not mistaken, the schema migration process from 3.x has unintentionally erased 
the information we need to correct that problem. More precisely, the way to 
recognize a dense SCF in 2.1 is that is has no {{REGULAR}} column definitions 
internally, but dense SCF in 2.1 had a {{COMPACT_VALUE}} column definition, and 
in the 3.x schema migration code, we have converted that to {{REGULAR}}. In 
other words, if on 3.x we have a SCF with a single {{REGULAR}} column 
definition, we cannot really know with certainty if it's a dense SCF whose 
{{COMPACT_VALUE}} has been converted to a {{REGULAR}}, or a genuinely non-dense 
SCF with a single user-declared definition. I can't currently think of a good 
solution to that problem. We can play some guessing game using a few 
assumptions and hope no user will break those assumptions but I wanted to open 
up the discussion on that problem before delving into bad solutions in case 
someone has an actually good solution.


Other than that, I only have a few very minor remarks:
* I believe the last line of the class javadoc ("On write path, ...") of 
{{SuperColumnCompatibility}} has been truncated.
* Hardcoding "column2" and "value" in {{SuperColumnCompatibility}} could 
theoretically clash with some use defined column names. There is code in 
{{CompactTables}} that's meant to deal with that and it probably make sense to 
reuse that.
* Nit: Talking of {{CompactTables}}, there is a few super columns related 
stuffs, maybe it's worth moving some of that (possibly including the part of 
the class javadoc that explains SuperColumn) to {{SuperColumnCompatibility}} so 
it's easier to locate and most super column stuffs are in one place.
* Nit: I'd probably merge {{getSuperCfKeyColumn()}} with 
{{makeSuperCfKeyColumn()}} and {{getSuperCfValueColumn()}} with 
{{makeSuperCfValueColumn()}}, as least as far as the "public" API of 
{{SuperColumnFamily}} is concerned (really just to keep things as encapsulated 
as possible).
* Nit: Patch adds a few unused imports in {{SelectStatement}}.

> 3.0 breaks CQL compatibility with super columns families
> --------------------------------------------------------
>
>                 Key: CASSANDRA-12373
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12373
>             Project: Cassandra
>          Issue Type: Bug
>          Components: CQL
>            Reporter: Sylvain Lebresne
>            Assignee: Alex Petrov
>             Fix For: 3.0.x, 3.11.x
>
>
> This is a follow-up to CASSANDRA-12335 to fix the CQL side of super column 
> compatibility.
> The details and a proposed solution can be found in the comments of 
> CASSANDRA-12335 but the crux of the issue is that super column famillies show 
> up differently in CQL in 3.0.x/3.x compared to 2.x, hence breaking backward 
> compatibilty.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-12373) 3.0 breaks CQL compatibility with super columns families

Reply via email to