[
https://issues.apache.org/jira/browse/CASSANDRA-8546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14270005#comment-14270005
]
Joshua McKenzie commented on CASSANDRA-8546:
--------------------------------------------
Regarding GapList - as recently as 10-28-2014, [there was a bug fix involving
invalid
data|https://groups.google.com/forum/#!topic/brownies-collections/su1zrxYGbMc];
I wouldn't consider this implementation production-ready on the scale we have
to work with on Cassandra. While the concept behind the collection looks
reasonably simple / elegant, given the relative obscurity of the
brownies-collection it's a part of I'd expect the number of code-bases it's
running in to be far fewer than any of the other collections we're using in our
code-base. I agree with Sylvain - the concept behind it is sound and it looks
like it's coming along well, I'm just not convinced it's appropriate for
inclusion in our code-base at its current level of maturity, much less in a
stable 2.1 release.
> RangeTombstoneList becoming bottleneck on tombstone heavy tasks
> ---------------------------------------------------------------
>
> Key: CASSANDRA-8546
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8546
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Environment: 2.0.11 / 2.1
> Reporter: Dominic Letz
> Assignee: Joshua McKenzie
> Fix For: 2.1.3
>
> Attachments: cassandra-2.0.11-8546.txt, cassandra-2.1-8546.txt,
> rangetombstonelist_compaction.png, rangetombstonelist_mutation.png,
> rangetombstonelist_read.png, tombstone_test.tgz
>
>
> I would like to propose a change of the data structure used in the
> RangeTombstoneList to store and insert tombstone ranges to something with at
> least O(log N) insert in the middle and at near O(1) and start AND end. Here
> is why:
> When having tombstone heavy work-loads the current implementation of
> RangeTombstoneList becomes a bottleneck with slice queries.
> Scanning the number of tombstones up to the default maximum (100k) can take
> up to 3 minutes of how addInternal() scales on insertion of middle and start
> elements.
> The attached test shows that with 50k deletes from both sides of a range.
> INSERT 1...110000
> flush()
> DELETE 1...50000
> DELETE 110000...60000
> While one direction performs ok (~400ms on my notebook):
> {code}
> SELECT * FROM timeseries WHERE name = 'a' ORDER BY timestamp DESC LIMIT 1
> {code}
> The other direction underperforms (~7seconds on my notebook)
> {code}
> SELECT * FROM timeseries WHERE name = 'a' ORDER BY timestamp ASC LIMIT 1
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)