[ 
https://issues.apache.org/jira/browse/CASSANDRA-11412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15208931#comment-15208931
 ] 

Marcus Olsson commented on CASSANDRA-11412:
-------------------------------------------

Hmm.. yes that seems to be a bug. Either we could implement hashCode() and 
equals() on all scanners or we could change (or add a separate) 
{{AbstractCompactionStrategy.getScanners()}} to take a collection of 
{{Range<Token>}}. I think the second option would be preferable since we would 
only call {{SSTableReader.getScanner()}} once for each sstable. There is 
already a {{getScanner()}} method that takes [a collection of 
ranges|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L1761]
 in SSTableReader. This would also make more sense with LCS since as it is now 
even if we check for equality the LeveledScanner might not always be equal for 
all ranges and create one scanner per range either way.

This would still make it so that we have one scanner per sstable opened at the 
same time though(which would pretty much be the same as pre-CASSANDRA-5220), 
except for when using leveled compaction. So it might be good to only open the 
scanners when needed depending on how many sstables we have. The implementation 
for that would probably be a bit more complex than for LCS since in 
LeveledScanner we work with the assumption that all sstables in a level are 
non-overlapping so we only open one scanner at a time. I think the partition 
iteration for overlapping sstables could be done as:
# Sort all the sstables based on tokens
# Open the first sstable scanner
# Read/merge first partition from all open scanners (and close exhausted 
scanners) and keep the first partition
# Compare the partition to the non-open sstables to see if they may contain a 
partition before or at the same token as the previously read partition, open a 
scanner if that's the case and then read the first partition
# Compare/merge all current partitions and return the first
# Continue at step 3 until all sstables are read

Unless I'm over-thinking this.

> Many sstablescanners opened during repair
> -----------------------------------------
>
>                 Key: CASSANDRA-11412
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11412
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Marcus Eriksson
>
> Since CASSANDRA-5220 we open [one sstablescanner per range per 
> sstable|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/compaction/CompactionStrategyManager.java#L374].
>  If compaction gets behind and you are running vnodes with 256 tokens and 
> RF3, this could become a problem (ie, {{768 * number of sstables}} scanners)
> We could probably refactor this similar to the way we handle scanners with 
> LCS - only open the scanner once we need it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to