[
https://issues.apache.org/jira/browse/CASSANDRA-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14367059#comment-14367059
]
Marcus Eriksson commented on CASSANDRA-8920:
--------------------------------------------
(adding comment here after discussion on irc)
This would probably be quite a bit slower for LCS since the overlappingSSTables
contain the sstables that overlap the currently compacting ones but are not
currently being compacted. This means that for LCS, this would contain all
other sstables on the node when doing a L0 -> L1 compaction.
For STCS this would probably work very well since we would almost always return
all sstables from the interval tree. Perhaps we should let the compaction
strategy decide if we should use the interval tree or not.
> Remove IntervalTree from maxPurgeableTimestamp calculation
> ----------------------------------------------------------
>
> Key: CASSANDRA-8920
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8920
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Benedict
> Assignee: Benedict
> Priority: Minor
> Fix For: 2.1.4
>
> Attachments: 8920.txt
>
>
> The IntervalTree only maps partition keys. Since a majority of users deploy a
> hashed partitioner the work is mostly wasted, since they will be evenly
> distributed across the full token range owned by the node - and in some cases
> it is a significant amount of work. We can perform a corroboration against
> the file bounds if we get a BF match as a sanity check if we like, but
> performing an IntervalTree search is significantly more expensive (esp. once
> murmur hash calculation memoization goes mainstream).
> In LCS, the keys are bounded, to it might appear that it would help, but in
> this scenario we only compact against like bounds, so again it is not helpful.
> With a ByteOrderedPartitioner it could potentially be of use, but this is
> sufficiently rare to not optimise for IMO.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)