[
https://issues.apache.org/jira/browse/CASSANDRA-16226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17232100#comment-17232100
]
Kornel Pal commented on CASSANDRA-16226:
----------------------------------------
[~maedhroz], since {{COMPACT STORAGE}} support was reintroduced to 4.0, would
it make sense to implement the fix for 3.0+ and document the performance
degradation as one more of the already many undesirable side effects of {{DROP
COMPACT STORAGE}}? {{isCQLTable()}} will be true after that, and then
rebuilding the SSTables (compaction, scrub or upgradesstables) will fix the
performance issues.
I did some more research and I don't think that the temporary performance issue
after {{DROP COMPACT STORAGE}} can be fixed without tracking the dropped
status. On another note, I think that the current implementation of {{DROP
COMPACT STORAGE}} is of very limited use. Ideally it should record that compact
storage was dropped and later rebuild the SSTable using a new structure that is
functionally equivalent to the old compact behavior, avoiding the issues
described in CASSANDRA-16217. Unfortunately I am not sure how much effort such
a change was.
Other {{ALTER TABLE}} operations such as dropping a column, changing
bloom_filter_fp_chance or compression options already require the SSTables to
be rebuilt (by compaction, scrub or upgradesstables) to take effect. This would
be the first of those operations to cause a temporary performance degradation
however. Considering even this, I believe that fixing the compact table
performance issue and dealing with the side effects of {{DROP COMPACT STORAGE}}
as part of the larger effort of removing compact storage support would benefit
the community by facilitating upgrade to 3.0+ from 2.x.
> COMPACT STORAGE SSTables created before 3.0 are not correctly skipped by
> timestamp due to missing primary key liveness info
> ---------------------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-16226
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16226
> Project: Cassandra
> Issue Type: Bug
> Components: Legacy/Local Write-Read Paths
> Reporter: Caleb Rackliffe
> Assignee: Caleb Rackliffe
> Priority: Normal
> Labels: perfomance, upgrade
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> This was discovered while tracking down a spike in the number of SSTables
> per read for a COMPACT STORAGE table after a 2.1 -> 3.0 upgrade. Before 3.0,
> there is no direct analog of 3.0's primary key liveness info. When we upgrade
> 2.1 COMPACT STORAGE SSTables to the mf format, we simply don't write row
> timestamps, even if the original mutations were INSERTs. On read, when we
> look at SSTables in order from newest to oldest max timestamp, we expect to
> have this primary key liveness information to determine whether we can skip
> older SSTables after finding completely populated rows.
> ex. I have three SSTables in a COMPACT STORAGE table with max timestamps
> 1000, 2000, and 3000. There are many rows in a particular partition, making
> filtering on the min and max clustering effectively a no-op. All data is
> inserted, and there are no partial updates. A fully specified row with
> timestamp 2500 exists in the SSTable with a max timestamp of 3000. With a
> proper row timestamp in hand, we can easily ignore the SSTables w/ max
> timestamps of 1000 and 2000. Without it, we read 3 SSTables instead of 1,
> which likely means a significant performance regression.
> The following test illustrates this difference in behavior between 2.1 and
> 3.0:
> https://github.com/maedhroz/cassandra/commit/84ce9242bedd735ca79d4f06007d127de6a82800
> A solution here might be as simple as having
> {{SinglePartitionReadCommand#canRemoveRow()}} only inspect primary key
> liveness information for non-compact/CQL tables. Tombstones seem to be
> handled at a level above that anyway. (One potential problem with that is
> whether or not the distinction will continue to exist in 4.0, and dropping
> compact storage from a table doesn't magically make pk liveness information
> appear.)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]