Andy Tolbert created CASSANDRA-20315:
----------------------------------------
Summary: Update RepairTokenRangeSplitter to work with
BTI-formatted SSTables
Key: CASSANDRA-20315
URL: https://issues.apache.org/jira/browse/CASSANDRA-20315
Project: Apache Cassandra
Issue Type: Improvement
Reporter: Andy Tolbert
While doing a review pass on the [CEP-37
PR|https://github.com/apache/cassandra/pull/3598] I realized that
{{RepairTokenRangeSplitter}} could not possibly work with the new BTI sstable
format introduced in 5.0 because it explicitly checks for {{BigTableReader}}
implementations.
I expect that the way it would behave would be to assume that all bytes in an
SSTable covering a range include that range, instead of just the parts of the
SSTable including the range. The splitter would still split ranges, but it
would overestimate how much data is covered in a range.
In addition, the way it calculates the amount of bytes a range covers in an
SSTable is big-format specific. I noticed that a new utility method
[SSTableReader.onDiskSizeForPartitionPositions|https://github.com/apache/cassandra/blob/511bd203144a71bf0a72876a393d186a9778407e/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L810]
was added recently in [CASSANDRA-20092] that can effectively calculate this
for us in an implementation-agnostic way. We should change the code to use
this method, and also remove any changes we made in {{SSTableScanner}} and
{{BigTableScanner}} for this.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]