[
https://issues.apache.org/jira/browse/CASSANDRA-11412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236263#comment-15236263
]
Paulo Motta commented on CASSANDRA-11412:
-----------------------------------------
Code and tests look good and this is definitely a big improvement from what we
had before, but as mentioned by [~molsson] we would still need to have 1
{{ISSTableScanner}} instance per sstable open during the repair process for the
non-LCS case. Do you think we should worry about optimizing this further by
lazily opening {{ISSTableScanners}} as partitions are iterated?
Building up from [~molsson] suggestion, I thought we could change
{{AbstractCompactionStrategy.getScanners(sstables, ranges)}} to return a
{{RangeScannerIterator}} instead, that returns a list of overlapping scanners
at each iteration. This iterator would have an {{OrderedMap<Range<Token>,
Set<SSTableReader>>}} with a set of overlapping sstables for each exclusive
subrange, and lazily instantiate {{ISSTableScanner}} as it iterates the
subranges, maybe reusing {{ISSTableScanner}} from previous iterations and
discarding them when no longer needed.
We would then need to create a new {{UnfilteredPartitionIterator}} to be used
during compaction that would operate over {{RangeScannerIterator}} instances,
merging returned {{ISSTableScanners}} for each exclusive subrange and renewing
the merge iterator after the previous merge iterator is exhausted.
Benefit is that we would keep a minimum amount of {{ISSTableScanner}} instances
open during compaction, avoiding things like CASSANDRA-4142 and we would have a
single solution for both LCS and non-LCS. Downside is probably increased
complexity and maybe overhead for building exclusive subranges.
Do you think this would work and is worth it? If so, should we do it here or
open a new ticket for it?
> Many sstablescanners opened during repair
> -----------------------------------------
>
> Key: CASSANDRA-11412
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11412
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Marcus Eriksson
> Assignee: Marcus Eriksson
> Fix For: 3.0.x, 3.x
>
>
> Since CASSANDRA-5220 we open [one sstablescanner per range per
> sstable|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/compaction/CompactionStrategyManager.java#L374].
> If compaction gets behind and you are running vnodes with 256 tokens and
> RF3, this could become a problem (ie, {{768 * number of sstables}} scanners)
> We could probably refactor this similar to the way we handle scanners with
> LCS - only open the scanner once we need it
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)