[
https://issues.apache.org/jira/browse/CASSANDRA-8921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14680290#comment-14680290
]
Daniel Chia commented on CASSANDRA-8921:
----------------------------------------
Thanks for the note of warning, totally understand and is in line with what I
was expecting. I was indeed looking for a something a little more algorithmic
and researchy, since I don't really want to block you guys on anything core.
Perhaps to help me kicks things off, do we know what is the workload
characteristic that creates the problem – I'm assuming it's when we compact
many sstables together that have many overlaps? Is this something that we've
profiled and determined to be a problem, or is it just anecdotally we think
this is the problem?
I ask so that I can verify the performance problem, investigate Bloomfi trees
as a potential solution and weigh the benefits / costs under different
scenarios.
> Experiment with a probabilistic tree of membership for maxPurgeableTimestamp
> ----------------------------------------------------------------------------
>
> Key: CASSANDRA-8921
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8921
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Benedict
>
> maxPurgeableTimestamp appears to be a significant cost for some workloads,
> the majority of which stemming from the cost of membership tests across the
> overlapping tables. It would be possible to construct a tree of bloom filters
> from the existing filters, that could yield queries of the set of possible
> membership of a given key with logarithmic performance, and it appears there
> is a research paper (that I haven't dived into yet) that outlines something
> like this http://www.usna.edu/Users/cs/adina/research/Bloofi%20_CloudI2013.pdf
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)