[jira] [Commented] (CASSANDRA-8731) Optimise merges involving multiple clustering columns

Sylvain Lebresne (JIRA) Wed, 04 Feb 2015 08:49:20 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305480#comment-14305480
 ]


Sylvain Lebresne commented on CASSANDRA-8731:
---------------------------------------------

bq. If you want to split intra-clustering-component to make byte-order 
comparable fields more efficient, that is more involved.

That's really the part I was refering to. I'm totally fine with having a 
specialized mergeIterator for clustering prefixes that knows not to compare the 
same components multiple times.

bq. especially for sstables that do overlap, but don't overlap for their 
entirety

I wasn't planning to handle them. For time series withh 
DateTieredCompactionStrategy, most of your sstables will be non-overlapping so 
it would still be a win.

bq. slicing them and merging them independently actually sounds to me to be 
much more fiddly and tough to get right than the simplest approach outlined here

If by "simplest approach" you mean the one PQ per level of clustering, then I 
think that's really orthogonal. What I'm suggesting is really not specific to 
multiple clustering level, while the PQ-per-level-of-clustering only help those 
case. What I can agree on is that doing CASSANDRA-6936 + doing a 
trie-for-byte-order-comparable-fields would likely make my suggestion less 
useful (even though there is cases where it would still be more efficient), but 
my initial guess is that doing both those thing is more involved that what I'm 
suggesting. But not trying to shoot your idea, just mentioning a potential 
optimization that imo is not really very involved (in particular, note that 
doing what I suggest only for single partition queries would be fine imo if 
that makes it simpler).

Anyway, I would suggest limiting the scope of this issue to "introduce a 
smarter merge iterator for multi-clustering" (because it's easy and I doubt 
it's contentious) and open other tickets for more involved byte-fiddling 
suggestions. 

> Optimise merges involving multiple clustering columns
> -----------------------------------------------------
>
>                 Key: CASSANDRA-8731
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8731
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>              Labels: performance
>             Fix For: 3.0
>
>
> Since the new storage format is dead in the water for the moment, we should 
> do our best to optimise current behaviour. When merging data from multiple 
> sstables with multiple clustering columns, currently we must incur the full 
> costs of comparison for the entire matching prefix, and must heapify every 
> cell in our PriorityQueue, incurring lg(N) of these costlier comparisons for 
> every cell we merge, where N is the number of sources we're merging.
> Essentially I'm proposing a trie-based merge approach as a replacement for 
> the ManyToOne MergeIterator, wherein we treat each clustering component as a 
> tree underwhich all Cells with a common prefix occur. We then perform a tree 
> merge, rather than a flat merge. For byte-order fields this trie can even be 
> a full binary-trie (although built on the fly). The advantage here is that we 
> rapidly prune merges involving disjoint ranges, so that instead of always 
> incurring lg(N) costs on each new record, we may often incur O(1) costs. For 
> timeseries data, for instance, we could merge dozens of files and so long as 
> they were non-overlapping our CPU burden would be little more than reading 
> from a single file.
> On top of this, we no longer incur any of the shared prefix repetition costs, 
> since we compare each prefix piece-wise, and only once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8731) Optimise merges involving multiple clustering columns

Reply via email to