[jira] [Commented] (CASSANDRA-8921) Experiment with a probabilistic tree of membership for maxPurgeableTimestamp

Benedict (JIRA) Fri, 07 Aug 2015 06:00:08 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661802#comment-14661802
 ]


Benedict commented on CASSANDRA-8921:
-------------------------------------

Hi Daniel,

No one is working on this, no, and we'd love you to have a go. Since it's a 
really non-trivial ticket, though, I feel like I should issue a brief warning 
that it will be more difficult than your average patch to reach inclusion. It's 
some fairly fundamental algorithmic work, and effectively a research piece, so 
it will receive a lot of scrutiny. It's also not *100%* certain if it will pay 
dividends. I expect it will, but it may have costs for some workloads, and 
there are open questions: 

* do we rebuild this tree each time we cross an sstable boundary? (this may 
lead to high CPU cost in some scenarios)
* do we build one tree up-front, and then post-filter? (this may lead to a 
significant increase in memory requirement)
* do we hybridise, and chain new BFs on wholesale, and rebalance the tree once 
we have enough that the cost will be sufficiently amortized? 

This all makes it a really fun piece of work, but I would recommend you think 
through how you think the algorithm would work and post a high-level design 
here, preferably with a little blurb about your expectation of the tradeoffs. 
That way we can discuss it up-front, and hopefully ease the gradual process of 
inclusion.

> Experiment with a probabilistic tree of membership for maxPurgeableTimestamp
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8921
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8921
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>
> maxPurgeableTimestamp appears to be a significant cost for some workloads, 
> the majority of which stemming from the cost of membership tests across the 
> overlapping tables. It would be possible to construct a tree of bloom filters 
> from the existing filters, that could yield queries of the set of possible 
> membership of a given key with logarithmic performance, and it appears there 
> is a research paper (that I haven't dived into yet) that outlines something 
> like this http://www.usna.edu/Users/cs/adina/research/Bloofi%20_CloudI2013.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8921) Experiment with a probabilistic tree of membership for maxPurgeableTimestamp

Reply via email to