[jira] [Commented] (CASSANDRA-12373) 3.0 breaks CQL compatibility with super columns families

Alex Petrov (JIRA) Mon, 11 Sep 2017 02:37:46 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-12373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16160985#comment-16160985
 ]


Alex Petrov commented on CASSANDRA-12373:
-----------------------------------------

bq. In the ctor, for isDense, we seem to force re-computation for non-SC 
non-dense tables (which happens to include non-compact tables).

Corrected it. This didn't cause any trouble (because was checking for an empty 
name, too), but now we'll recompute only for the supercolumn ones, which is 
much better.

bq. But for I'm also not 100% sure this work in all upgrade scenario.

Thing is the re-computation is necessary only for the upgrade from 2.x -> 
current 3.x -> 3.x + this patch, in which case we have two clustering columns 
instead of one just a single "empty" value column, which is a collection. 
3.x-created supercolumns are calculated correctly 
[here|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/thrift/ThriftConversion.java#L184].

You are right that some testing in that regard won't hurt, so I've added some 
3-step upgrade tests with the scenarios similar to the current one. For the 
sakes of completeness, I've also added a test for current 3.x -created 
supercolumn families. 

bq. Unfortunately, while we were supposed to have a SC upgrade dtests

I've added some dtests as well as upgrade tests.

bq. I'm not entirely sure if/how column renaming works for the SC key and value 
"fake" column

You're right: I did leave it off as I thought read/write path is the only 
support that's required for migrating off SCF, but adding this is trivial. I've 
added tests for all sorts of renames. However, there is no special-casing done 
in {{AlterStatement}}: {{getColumnDefinition}} would return the "fake" columns 
as well. {{removeColumn}} would be a no-op in case of fake columns and adding 
column would force re-initialisation and the new column would get picked up 
correctly.

bq. In UpdateParameters#addCounter, not sure the change from 
CounterContext#createUpdate to CounterContext#createLocal is corret.

Yes, sorry, I have overlooked it. 

bq. In getSuperCfKeyColumn, not sure what "3.x-created supercolumn family" 
refers to (is that SC tables created from thrift in 3.x? or SC tables whose 
schema had been upgraded by a previous 3.x version?) and more importantly, why 
it would differ in number of clustering columns?

You are right, SC columns created from Thrift in 3.x. When upgrading from 2.x, 
you'll get {{column1}} and {{column2}} as clustering keys (which they kind of 
are). When created via thrift in 3.x, {{column2}} is entirely virtual (which it 
kind of is). 

bq. There is a typo in a comment, where in "... can't be change to support 
inclusive slice ...", I believe you want "exclusive".

Fixed.

bq. in the 2nd branch (else if ...), we should use result.add(cell) directly. 
Not only does that already handle counters, but more importantly, if we don't 
do that, things like ttl(v) will not work on the compact column value.

Good point. Fixed.

bq. the last else is confusing: by definition we shouldn't be exposing any 
other column than the "compact value" column, so that branch should probably 
not be there (or be an assertion that we shouldn't get there).

You're right. This is an artefact of the previous incorrect dense table 
handling.

bq. In the updates loop in prepareUpdateOperations, not sure why the 2 first 
cases seem to assume that cfm.isSuperColumnValueColumn(def), but the 3rd case 
tests it explicitely (but after having assigned superColumnValue). In general, 
not sure why the 3rd case looks different from the other 2. Also, I find it a 
bit hard to follow the mix of if using continue and sometime using else, 
sometimes not. I'm personally prefer using else when needed and remove all the 
continue. Lastly, seems the method could use a check for superColumnKey != null.

I've refactored this method even further. There were many things that were 
technically correct, but were added on earlier stages, so got there 
historically. Also, did a bit of further improvements with 
{{collectMarkerSpecifications}}

bq. In prepareInsertForJSONOperations, I believe the 2nd call to 
getRawTermForColumn is unecessary, you can use the raw for before the if. 
Further, the overall code of this method feels very similar to the one in 
prepareInsertOperations: can we extract the bulk of what's common? (the loop 
seems to be operating on a ColumnDefinition and the associated value, so either 
pass an Iterable<Pair<...>>, or build 2 lists; I don't think we care about the 
performance difference here).

Refactored this part as well.

bq. In prepareDeleteOperations, need to fix the "Maybe not single only" 
comment. 

I think what was meant was "not only {{SingleColumnRelation}}", but this is not 
true: multi column operations are not allowed. After writing more tests, I did 
discover that multi-column restrictions were not working properly, so I have 
added support for them. Also, for IN queries.

bq. n rebuildLWTConditions, I'd invert the initial if check (making it, imo, 
easier to read) with a direct return and "de-indent" the bulk of the method. 
Also, I'd remove the columnConditions.clear(): doesn't feel useful, and feels 
weird to do it here (what if we make that collection immutable in future?).

Good point, fixed

bq. In isSuperColumnMapColumn, can replace the first condition by 
ColumnDefinition#isRegular (not something due to this patch, but a good 
occasion to simplify).

Fixed 

bq. In SelectStatement, I believe that if you move the 
SuperColumnCompatibility#processPartition call at the beginning of 
SelectStatement#processPartition, you'd be able to remove the specific call in 
ModificationStatement#buildCasFailureResultSet (and generally reduce the 
changes of misuses).

Fixed

bq. for 'if-then-else', we never use brakets for one of the branch and not 
another (meaning that we only skip brackets if all of the branches are 
one-liner). 

I hope I've fixed all the instances.

bq. Slightly bummed by the addition of the SC special fields in 
StatementRestrictions. Can't we move most of this in 
SuperColumnCompatibility#getColumnFilter, by extracting things there (tbc, I'm 
genuninely not sure how easy change that is, and if it's too convoluted, I'm 
fine leaving things as they are; I just prefer having a much special casing in 
SuperColumnCompatibility as possible. I also assume we can't entirely remove 
all special casing from StatementRestrictions since we will at least make sure 
it doesn't complain about requiring ALLOW FILTERING if we leave the 
"superColumnKeyColumn" restriciton in)?

I agree this did not look pretty. At the time of writing that felt like it made 
sense. But after the current refactoring, I had to switch most of the things to 
restrictions anyways, and there are now more different special cases, so I have 
opted out for a class that would hold all the restriction special-cases. I've 
added a lot of documentation inline, hope this should suffice. 

bq. In ModificationStatement why not directly use cfm.superCfValueColumn 
instead of looping over cfm.allColumnsInSeelectOrder() to find it?

Good point, simplified that one, too.


A word on the latest changes in the patch: because we have to support _all_ 
{{ALTER}} statements that are supported by 2.x, we got a bit more logic in 
{{getSuper(CfKey|Value)Column}} now. Also, rebuild was changed in order to 
avoid accidentally dropping the fake columns, since we have to make sure they 
do get persisted. In order to properly interface 2.x nodes, there are several 
special-cases in {{CFMetadata}} to make sure we compose {{Selection}} 
correctly. Another big change is the way {{WHERE}} clauses are handled, pretty 
much everywhere. Biggest reason for it is that previous way (having public 
variables in {{StatementRestrictions}} was too ad-hoc and many use-cases were 
missing. Now we can correctly remap multi-column restrictions on SC key (both 
EQ and slice), slice (and EQ) restrictions on the SC key and IN restriction. 
All of them are handled somewhat differently and some require filtering on the 
very last stage. Where possible I've tried to use collection bounds. 

> 3.0 breaks CQL compatibility with super columns families
> --------------------------------------------------------
>
>                 Key: CASSANDRA-12373
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12373
>             Project: Cassandra
>          Issue Type: Bug
>          Components: CQL
>            Reporter: Sylvain Lebresne
>            Assignee: Alex Petrov
>             Fix For: 3.0.x, 3.11.x
>
>
> This is a follow-up to CASSANDRA-12335 to fix the CQL side of super column 
> compatibility.
> The details and a proposed solution can be found in the comments of 
> CASSANDRA-12335 but the crux of the issue is that super column famillies show 
> up differently in CQL in 3.0.x/3.x compared to 2.x, hence breaking backward 
> compatibilty.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-12373) 3.0 breaks CQL compatibility with super columns families

Reply via email to