[
https://issues.apache.org/jira/browse/CASSANDRA-4883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13495757#comment-13495757
]
Jonathan Ellis commented on CASSANDRA-4883:
-------------------------------------------
Any reason to not use ImmutableSet in DataTracker?
+1 otherwise.
> Optimize mostRecentTomstone vs maxTimestamp check in
> CollationController.collectAllData
> ---------------------------------------------------------------------------------------
>
> Key: CASSANDRA-4883
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4883
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Sylvain Lebresne
> Assignee: Sylvain Lebresne
> Priority: Minor
> Fix For: 1.2.0 rc1
>
> Attachments: 4883.txt
>
>
> CollationController.collectAllData eliminates a sstable if we've already read
> a row tombstone more recent that its maxTimestamp. This is however done in 2
> passes and can be inefficient (or rather, it's not as efficient as it could).
> More precisely, say we have 10 sstables s0, ... s9, where s0 is the most
> recent and s9 the least one (and their maxTimestamp reflect that) and s0 has
> a row tombstone that is more recent than all of s1-s9 maxTimestamps. Now in
> collectAllData(), we first iterate over sstables in a "random" order (because
> DataTracker keeps sstable in a more or less random order). Meaning that we
> may iterate in the order s9, s8, ... s0. In that case, we will end up reading
> the row header from all the sstable (hitting disk each time). Then, and only
> then, the 2nd pass of collectAllData will eliminate s1 to s9.
> However, if we were to iterate sstable in maxTimestamps order (as we do in
> collectTimeOrdered), we would only need one pass but more importantly we
> would minimize the number of row header we read to perform that sstable
> eliminination. In my example, we would only ever read the row tombstone from
> s0 and eliminate all other sstable directly, simply based on their
> maxTimestamp.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira