[
https://issues.apache.org/jira/browse/CASSANDRA-8921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14661802#comment-14661802
]
Benedict commented on CASSANDRA-8921:
-------------------------------------
Hi Daniel,
No one is working on this, no, and we'd love you to have a go. Since it's a
really non-trivial ticket, though, I feel like I should issue a brief warning
that it will be more difficult than your average patch to reach inclusion. It's
some fairly fundamental algorithmic work, and effectively a research piece, so
it will receive a lot of scrutiny. It's also not *100%* certain if it will pay
dividends. I expect it will, but it may have costs for some workloads, and
there are open questions:
* do we rebuild this tree each time we cross an sstable boundary? (this may
lead to high CPU cost in some scenarios)
* do we build one tree up-front, and then post-filter? (this may lead to a
significant increase in memory requirement)
* do we hybridise, and chain new BFs on wholesale, and rebalance the tree once
we have enough that the cost will be sufficiently amortized?
This all makes it a really fun piece of work, but I would recommend you think
through how you think the algorithm would work and post a high-level design
here, preferably with a little blurb about your expectation of the tradeoffs.
That way we can discuss it up-front, and hopefully ease the gradual process of
inclusion.
> Experiment with a probabilistic tree of membership for maxPurgeableTimestamp
> ----------------------------------------------------------------------------
>
> Key: CASSANDRA-8921
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8921
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Benedict
>
> maxPurgeableTimestamp appears to be a significant cost for some workloads,
> the majority of which stemming from the cost of membership tests across the
> overlapping tables. It would be possible to construct a tree of bloom filters
> from the existing filters, that could yield queries of the set of possible
> membership of a given key with logarithmic performance, and it appears there
> is a research paper (that I haven't dived into yet) that outlines something
> like this http://www.usna.edu/Users/cs/adina/research/Bloofi%20_CloudI2013.pdf
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)