[jira] [Updated] (CASSANDRA-6633) Dynamic Resize of Bloom Filters

Benedict (JIRA) Thu, 13 Mar 2014 14:50:38 -0700

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Benedict updated CASSANDRA-6633:
--------------------------------

      Component/s: Core
      Description: 
Dynamic resizing would be useful. The simplest way to achieve this is to have 
separate address spaces for each hash function, so that we may 
increase/decrease accuracy by simply loading/unloading another function (we 
could even do interesting stuff in future like alternating the functions we 
select if we find we're getting more false positives than should be expected);
2) Faster loading/unloading would help this, and we could achieve this by 
mmapping the bloom filter representation on systems that we can mlock.


  was:
Investigate various possible improvements to our bloom filters:

1) Dynamic resizing would be useful. There are a few ways this could be 
achieved: with some modification, downsampling could be supported; partitioning 
the hash functions so that we may select the number of hashes/bits dynamically, 
by loading/unloading a given partition; and there are some related data 
structures, such as Quotient Filters that support resizing and merging.
2) Faster loading: should be possible to mmap the bloom filter representation 
on disk
3) Most ambitious of all, would be to try to reduce the memory requirement of 
bloom filters. Golomb Coded Sets 
[1|http://algo2.iti.kit.edu/singler/publications/cacheefficientbloomfilters-wea2007.pdf]
 are a possibility, as are other compressed hash structures 
[2|http://www.it-c.dk/people/pagh/papers/bloom.pdf].




         Priority: Minor  (was: Major)
    Fix Version/s: 3.0

> Dynamic Resize of Bloom Filters
> -------------------------------
>
>                 Key: CASSANDRA-6633
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6633
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Benedict
>            Priority: Minor
>             Fix For: 3.0
>
>
> Dynamic resizing would be useful. The simplest way to achieve this is to have 
> separate address spaces for each hash function, so that we may 
> increase/decrease accuracy by simply loading/unloading another function (we 
> could even do interesting stuff in future like alternating the functions we 
> select if we find we're getting more false positives than should be expected);
> 2) Faster loading/unloading would help this, and we could achieve this by 
> mmapping the bloom filter representation on systems that we can mlock.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (CASSANDRA-6633) Dynamic Resize of Bloom Filters

Reply via email to