[ 
https://issues.apache.org/jira/browse/CASSANDRA-11412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236263#comment-15236263
 ] 

Paulo Motta commented on CASSANDRA-11412:
-----------------------------------------

Code and tests look good and this is definitely a big improvement from what we 
had before, but as mentioned by [~molsson] we would still need to have 1 
{{ISSTableScanner}} instance per sstable open during the repair process for the 
non-LCS case. Do you think we should worry about optimizing this further by 
lazily opening {{ISSTableScanners}} as partitions are iterated?

Building up from  [~molsson] suggestion, I thought we could change 
{{AbstractCompactionStrategy.getScanners(sstables, ranges)}} to return a 
{{RangeScannerIterator}} instead, that returns a list of overlapping scanners 
at each iteration. This iterator would have an {{OrderedMap<Range<Token>, 
Set<SSTableReader>>}} with a set of overlapping sstables for each exclusive 
subrange, and lazily instantiate {{ISSTableScanner}} as it iterates the 
subranges, maybe reusing {{ISSTableScanner}} from previous iterations and 
discarding them when no longer needed.

We would then need to create a new {{UnfilteredPartitionIterator}} to be used 
during compaction that would operate over {{RangeScannerIterator}} instances, 
merging returned {{ISSTableScanners}} for each exclusive subrange and renewing 
the merge iterator after the previous merge iterator is exhausted.

Benefit is that we would keep a minimum amount of {{ISSTableScanner}} instances 
open during compaction, avoiding things like CASSANDRA-4142 and we would have a 
single solution for both LCS and non-LCS. Downside is probably increased 
complexity and maybe overhead for building exclusive subranges.

Do you think this would work and is worth it? If so, should we do it here or 
open a new ticket for it?

> Many sstablescanners opened during repair
> -----------------------------------------
>
>                 Key: CASSANDRA-11412
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11412
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Marcus Eriksson
>            Assignee: Marcus Eriksson
>             Fix For: 3.0.x, 3.x
>
>
> Since CASSANDRA-5220 we open [one sstablescanner per range per 
> sstable|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/compaction/CompactionStrategyManager.java#L374].
>  If compaction gets behind and you are running vnodes with 256 tokens and 
> RF3, this could become a problem (ie, {{768 * number of sstables}} scanners)
> We could probably refactor this similar to the way we handle scanners with 
> LCS - only open the scanner once we need it



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to