[jira] [Commented] (CASSANDRA-16675) Preserve Query Performance with ClusteringIndexNamesFilter After Running DROP COMPACT STORAGE

Caleb Rackliffe (Jira) Mon, 14 Jun 2021 09:31:04 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-16675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17363058#comment-17363058
 ]


Caleb Rackliffe commented on CASSANDRA-16675:
---------------------------------------------

This is already not blocking release, but it might be worth mentioning that its 
urgency is somewhat reduced if we move to mark DROP COMPACT STORAGE as 
experimental.

> Preserve Query Performance with ClusteringIndexNamesFilter After Running DROP 
> COMPACT STORAGE
> ---------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-16675
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16675
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Legacy/Local Write-Read Paths
>            Reporter: Caleb Rackliffe
>            Assignee: Caleb Rackliffe
>            Priority: Normal
>             Fix For: 4.0.x
>
>
> Before the completion of CASSANDRA-16226, upgrading a cluster from 2.1 to 3.0 
> with compact tables could cause a significant regression in the latency of 
> reads using ClusteringIndexNamesFilter. The details are described in that 
> Jira, but in short, 3.0+ did not skip SSTables it should have during reads, 
> because it thought (wrongly) there might be primary key liveness information 
> in SSTables for compact tables.
> CASSANDRA-16226 addressed this behavior for still-compact tables, and also 
> maintained it after DROP COMPACT STORAGE was run. However, it also allowed 
> tables that were never compact to drop rows from query results if they 
> contained no live non-key columns, which is only a normal behavior for 
> compact tables. This is addressed in CASSANDRA-16671 by reverting the bits of 
> the logic from CASSANDRA-16226 that deal with formerly compact tables where 
> DROP COMPACT STORAGE has been run, in the interest of unblocking the 4.0 
> release and making sure strictly compact and strictly non-compact tables are 
> queried properly and construct properly formed results.
> This goal of this issue is to safely restore the performance of formerly 
> compact tables, which necessarily contain ambiguous primary key liveness 
> info. Roughly, the idea is that we record in a system table (and pull into 
> TableMetadata) the time when DROP COMPACT STORAGE is executed. If a time 
> exists for a table, we can treat it as being formerly compact, and ignore 
> primary key liveness info for determining row completeness in 
> SinglePartitionReadCommand#isComplete(). Otherwise, the normal rules for 
> never-compact tables will apply, avoiding any regression in the scenario 
> described by CASSANDRA-16671.
> This would obviously not be helpful in the case where a user has already 
> dropped compact storage, but it may logically be the best we can do, given we 
> cannot correctly reconstruct liveness info for SSTables created while a table 
> was compact (i.e. there is no way to tell INSERT and UPDATE apart for those). 
> Especially if CASSANDRA-16671 moves in the direction of disabling DROP 
> COMPACT STORAGE by default, I would also propose that we do this only for 
> 4.0+.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-16675) Preserve Query Performance with ClusteringIndexNamesFilter After Running DROP COMPACT STORAGE

Reply via email to