[ 
https://issues.apache.org/jira/browse/CASSANDRA-8546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14264547#comment-14264547
 ] 

Sylvain Lebresne commented on CASSANDRA-8546:
---------------------------------------------

The reason you get this is that RangeTombstoneList doesn't handle reverse 
queries as it should. That is, reversed queries are the worst case for it 
currently. We could however do the same thing we do for 
ArrayBackedSortedColumns, i.e. invert the order of the RangeTombstoneList for 
those reverse queries so we hit the best case rather than the worth one.

This is also why I'm not totally convinced about replacing with a BTree: during 
reads, we know we'll add the ranges in order (albeit in reverse order for 
reverse queries) and having O(1) insertion in that case is a good thing. It's 
true that we also use RangeTombstoneList in memtables and there, we will have 
insertion in potentially random order and the current implementation is not 
optimal for that case. That said, it is *not* what this ticket is running into 
and if optimizing for that case is at the expense of the read path, it's 
unclear it's a win. In particular, for the random order insertion to start 
being a big problem, you'd need to insert very large amount of range tombstones 
within the same partition in random order in the interval of time it needs to 
get flushed. It's theoretically possible, but I wonder how likely it is in 
practice.

Anyway, there is also the fact that doing that kind of change in 2.0 is out of 
question, and 2.1 is also borderline. I'm typically personally not in favor of 
targeting 2.1 for that kind of change: I don't know the exact use case here, 
but I would suspect it's possible to workaround by avoid reverse queries on 
tombstone heavy tables. Or avoid tombstone heavy tasks in the first place. And 
for 3.0, CASSANDRA-8099 should make this particular problem go away.



> RangeTombstoneList becoming bottleneck on tombstone heavy tasks
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-8546
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8546
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: 2.0.11 / 2.1
>            Reporter: Dominic Letz
>            Assignee: Benedict
>             Fix For: 2.1.3
>
>         Attachments: cassandra-2.0.11-8546.txt, cassandra-2.1-8546.txt, 
> tombstone_test.tgz
>
>
> I would like to propose a change of the data structure used in the 
> RangeTombstoneList to store and insert tombstone ranges to something with at 
> least O(log N) insert in the middle and at near O(1) and start AND end. Here 
> is why:
> When having tombstone heavy work-loads the current implementation of 
> RangeTombstoneList becomes a bottleneck with slice queries.
> Scanning the number of tombstones up to the default maximum (100k) can take 
> up to 3 minutes of how addInternal() scales on insertion of middle and start 
> elements.
> The attached test shows that with 50k deletes from both sides of a range.
> INSERT 1...110000
> flush()
> DELETE 1...50000
> DELETE 110000...60000
> While one direction performs ok (~400ms on my notebook):
> {code}
> SELECT * FROM timeseries WHERE name = 'a' ORDER BY timestamp DESC LIMIT 1
> {code}
> The other direction underperforms (~7seconds on my notebook)
> {code}
> SELECT * FROM timeseries WHERE name = 'a' ORDER BY timestamp ASC LIMIT 1
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to