[
https://issues.apache.org/jira/browse/CASSANDRA-16226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17233168#comment-17233168
]
Caleb Rackliffe commented on CASSANDRA-16226:
---------------------------------------------
bq. I don't think that the temporary performance issue after DROP COMPACT
STORAGE can be fixed without tracking the dropped status.
Before going too much farther here, I should probably lay out one more time how
the concept of an empty row differs between compact and non-compact tables and
how that affects the way they interact with the read path optimizations for
skipping SSTables.
For compact tables, there is no concept of primary key liveness. When a row has
no live cells, it is simply empty. For a non-compact table, it is possible to
have a live row that happens to have no live cells. Imagine the following
example:
{noformat}
INSERT INTO foo (partitionKey, clustering, value) VALUES (0, 1, 1)
DELETE value FROM foo WHERE partitionKey = 0 AND clustering = 1
SELECT * FROM foo WHERE partitionKey = 1 AND clustering = 1
{noformat}
With compact storage, this SELECT will return nothing/zero rows. With a
non-compact table, this will return a single row {{(0, 1, null)}}. Any solution
for this Jira should preserve this behavior, i.e. when DROP COMPACT STORAGE
runs, we should start returning the second result, and all existing non-compact
tables should keep the same behavior as well.
Right now, I'm working on a solution that a.) preserves this behavior, b.)
requires no changes to the SSTable format, and c.) fixes the performance
regression originally reported in this Jira, in addition to one or two that
don't actually relate to compact tables. I'll hopefully have a rough patch in
the next day or so.
> COMPACT STORAGE SSTables created before 3.0 are not correctly skipped by
> timestamp due to missing primary key liveness info
> ---------------------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-16226
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16226
> Project: Cassandra
> Issue Type: Bug
> Components: Legacy/Local Write-Read Paths
> Reporter: Caleb Rackliffe
> Assignee: Caleb Rackliffe
> Priority: Normal
> Labels: perfomance, upgrade
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> This was discovered while tracking down a spike in the number of SSTables
> per read for a COMPACT STORAGE table after a 2.1 -> 3.0 upgrade. Before 3.0,
> there is no direct analog of 3.0's primary key liveness info. When we upgrade
> 2.1 COMPACT STORAGE SSTables to the mf format, we simply don't write row
> timestamps, even if the original mutations were INSERTs. On read, when we
> look at SSTables in order from newest to oldest max timestamp, we expect to
> have this primary key liveness information to determine whether we can skip
> older SSTables after finding completely populated rows.
> ex. I have three SSTables in a COMPACT STORAGE table with max timestamps
> 1000, 2000, and 3000. There are many rows in a particular partition, making
> filtering on the min and max clustering effectively a no-op. All data is
> inserted, and there are no partial updates. A fully specified row with
> timestamp 2500 exists in the SSTable with a max timestamp of 3000. With a
> proper row timestamp in hand, we can easily ignore the SSTables w/ max
> timestamps of 1000 and 2000. Without it, we read 3 SSTables instead of 1,
> which likely means a significant performance regression.
> The following test illustrates this difference in behavior between 2.1 and
> 3.0:
> https://github.com/maedhroz/cassandra/commit/84ce9242bedd735ca79d4f06007d127de6a82800
> A solution here might be as simple as having
> {{SinglePartitionReadCommand#canRemoveRow()}} only inspect primary key
> liveness information for non-compact/CQL tables. Tombstones seem to be
> handled at a level above that anyway. (One potential problem with that is
> whether or not the distinction will continue to exist in 4.0, and dropping
> compact storage from a table doesn't magically make pk liveness information
> appear.)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]