Caleb Rackliffe created CASSANDRA-16226:
-------------------------------------------
Summary: COMPACT STORAGE SSTables created before 3.0 are not
correctly skipped by timestamp due to missing primary key liveness info
Key: CASSANDRA-16226
URL: https://issues.apache.org/jira/browse/CASSANDRA-16226
Project: Cassandra
Issue Type: Bug
Components: Legacy/Local Write-Read Paths
Reporter: Caleb Rackliffe
This was discovered while tracking down a spike in the number of SSTables per
read for a COMPACT STORAGE table after a 2.1 -> 3.0 upgrade. Before 3.0, there
is no direct analog of 3.0's primary key liveness info. When we upgrade 2.1
COMPACT STORAGE SSTables to the mf format, we simply don't write row
timestamps, even if the original mutations were INSERTs. On read, when we look
at SSTables in order from newest to oldest max timestamp, we expect to have
this primary key liveness information to determine whether we can skip older
SSTables after finding completely populated rows.
ex. I have three SSTables in a COMPACT STORAGE table with max timestamps 1000,
2000, and 3000. There are many rows in a particular partition, making filtering
on the min and max clustering effectively a no-op. All data is inserted, and
there are no partial updates. A fully specified row with timestamp 2500 exists
in the SSTable with a max timestamp of 3000. With a proper row timestamp in
hand, we can easily ignore the SSTables w/ max timestamps of 1000 and 2000.
Without it, we read 3 SSTables instead of 1, which likely means a significant
performance regression.
The following test illustrates this difference in behavior between 2.1 and 3.0:
https://github.com/maedhroz/cassandra/commit/84ce9242bedd735ca79d4f06007d127de6a82800
A solution here might be as simple as having
{{SinglePartitionReadCommand#canRemoveRow()}} only inspect primary key liveness
information for non-compact/CQL tables. Tombstones seem to be handled at a
level above that anyway.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]