[ 
https://issues.apache.org/jira/browse/CASSANDRA-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14305501#comment-14305501
 ] 

Benedict commented on CASSANDRA-8731:
-------------------------------------

bq. Anyway, I would suggest limiting the scope of this issue to "introduce a 
smarter merge iterator for multi-clustering" (because it's easy and I doubt 
it's contentious) and open other tickets for more involved byte-fiddling 
suggestions.

Yup, that seems eminently sensible.

bq. but my initial guess is that doing both those thing is more involved that 
what I'm suggesting

Potentially, yes. Although a simple binary-trie based merge would not likely be 
very hard; certainly not as hard as implementing an actual binary-trie. The 
question is probably mostly if we get byte-order comparability for common 
fields.

bq. then I think that's really orthogonal

Close enough to agreed that I won't split hairs, and you're right, your 
suggestion would work for single clustering columns as well, so long as they 
are actually disjoint. The problem is that this is still likely to be fiddlier 
than it seems you're suggesting. DTCS is not likely to be non-overlapping; any 
two adjacent sstables will overlap a little (possibly a lot), and so you have 
to do something to unpick them. You could perform separate merges for the 
overlapping ranges, and have a rolling window, but this is also non-trivial. If 
we went for implementing the DTCS++ I suggested in the original ticket, of 
course (which would be quite achievable to deliver for 3.0) then this would 
guarantee non-overlapping ranges, and this could be made significantly simpler 
as a result. Either way, we should file a separate ticket for this and discuss 
it there.



> Optimise merges involving multiple clustering columns
> -----------------------------------------------------
>
>                 Key: CASSANDRA-8731
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8731
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>              Labels: performance
>             Fix For: 3.0
>
>
> Since the new storage format is dead in the water for the moment, we should 
> do our best to optimise current behaviour. When merging data from multiple 
> sstables with multiple clustering columns, currently we must incur the full 
> costs of comparison for the entire matching prefix, and must heapify every 
> cell in our PriorityQueue, incurring lg(N) of these costlier comparisons for 
> every cell we merge, where N is the number of sources we're merging.
> Essentially I'm proposing a trie-based merge approach as a replacement for 
> the ManyToOne MergeIterator, wherein we treat each clustering component as a 
> tree underwhich all Cells with a common prefix occur. We then perform a tree 
> merge, rather than a flat merge. For byte-order fields this trie can even be 
> a full binary-trie (although built on the fly). The advantage here is that we 
> rapidly prune merges involving disjoint ranges, so that instead of always 
> incurring lg(N) costs on each new record, we may often incur O(1) costs. For 
> timeseries data, for instance, we could merge dozens of files and so long as 
> they were non-overlapping our CPU burden would be little more than reading 
> from a single file.
> On top of this, we no longer incur any of the shared prefix repetition costs, 
> since we compare each prefix piece-wise, and only once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to