[ https://issues.apache.org/jira/browse/CASSANDRA-9843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632257#comment-14632257 ]
Jonathan Ellis commented on CASSANDRA-9843: ------------------------------------------- Rather than just adding an ARF per partition (the way we used to have a BF -- the difference is that BF is not useful for scans but this would be), we may be able to adapt this further by moving our index into the ARF. Instead of just a bit indicating yes or no we could have the offset for the start of each range [that we do have data for] in the leaf. (The "adaptive" in ARF means you can tune it to index hot parts of the data range in greater detail, without increasing the total memory used, at the cost of less detail for the cold ranges. We could do this in Cassandra as well, writing updated ARF to a new file. This could reduce the memory problems of pulling the indexes for very large partitions into memory. However, the paper describes very good results even without adaptation, so this is not required for proof of concept.) > Augment or replace partition index with adaptive range filters > -------------------------------------------------------------- > > Key: CASSANDRA-9843 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9843 > Project: Cassandra > Issue Type: New Feature > Components: Core > Reporter: Jonathan Ellis > Assignee: T Jake Luciani > Labels: performance > > Adaptive range filters are, in principle, bloom filters for range queries. > They provide a space-efficient way to avoid scanning a partition when we can > tell that we do not contain any data for the range requested. Like BF, they > can return false positives but not false negatives. > The implementation is of course totally different from BF. ARF is a tree > where each leaf of the tree is a range of data and a bit, either on or off, > denoting whether we have *some* data in that range. > ARF are described here: http://www.vldb.org/pvldb/vol6/p1714-kossmann.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)