[
https://issues.apache.org/jira/browse/CASSANDRA-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167206#comment-15167206
]
Sylvain Lebresne commented on CASSANDRA-8731:
---------------------------------------------
To sum up this, one "relatively" simple thing we could try here is to change
our merge iterator so it's clustering aware. Or more precisely, to add a new
merge iterator specialized for when the input is a {{ClusteringPrefix}}), that
would keep one priority queue for each level of clustering, avoiding comparing
equal prefixes more than necessary. I suggest we start there.
> Optimise merges involving multiple clustering columns
> -----------------------------------------------------
>
> Key: CASSANDRA-8731
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8731
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Benedict
> Labels: compaction, performance
> Fix For: 3.x
>
>
> Since the new storage format is dead in the water for the moment, we should
> do our best to optimise current behaviour. When merging data from multiple
> sstables with multiple clustering columns, currently we must incur the full
> costs of comparison for the entire matching prefix, and must heapify every
> cell in our PriorityQueue, incurring lg(N) of these costlier comparisons for
> every cell we merge, where N is the number of sources we're merging.
> Essentially I'm proposing a trie-based merge approach as a replacement for
> the ManyToOne MergeIterator, wherein we treat each clustering component as a
> tree underwhich all Cells with a common prefix occur. We then perform a tree
> merge, rather than a flat merge. For byte-order fields this trie can even be
> a full binary-trie (although built on the fly). The advantage here is that we
> rapidly prune merges involving disjoint ranges, so that instead of always
> incurring lg(N) costs on each new record, we may often incur O(1) costs. For
> timeseries data, for instance, we could merge dozens of files and so long as
> they were non-overlapping our CPU burden would be little more than reading
> from a single file.
> On top of this, we no longer incur any of the shared prefix repetition costs,
> since we compare each prefix piece-wise, and only once.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)