[ 
https://issues.apache.org/jira/browse/CASSANDRA-8921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14680290#comment-14680290
 ] 

Daniel Chia commented on CASSANDRA-8921:
----------------------------------------

Thanks for the note of warning, totally understand and is in line with what I 
was expecting. I was indeed looking for a something a little more algorithmic 
and researchy, since I don't really want to block you guys on anything core.

Perhaps to help me kicks things off, do we know what is the workload 
characteristic that creates the problem – I'm assuming it's when we compact 
many sstables together that have many overlaps? Is this something that we've 
profiled and determined to be a problem, or is it just anecdotally we think 
this is the problem?

I ask so that I can verify the performance problem, investigate Bloomfi trees 
as a potential solution and weigh the benefits / costs under different 
scenarios.

> Experiment with a probabilistic tree of membership for maxPurgeableTimestamp
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8921
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8921
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>
> maxPurgeableTimestamp appears to be a significant cost for some workloads, 
> the majority of which stemming from the cost of membership tests across the 
> overlapping tables. It would be possible to construct a tree of bloom filters 
> from the existing filters, that could yield queries of the set of possible 
> membership of a given key with logarithmic performance, and it appears there 
> is a research paper (that I haven't dived into yet) that outlines something 
> like this http://www.usna.edu/Users/cs/adina/research/Bloofi%20_CloudI2013.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to