[
https://issues.apache.org/jira/browse/CASSANDRA-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Benedict updated CASSANDRA-6633:
--------------------------------
Component/s: Core
Description:
Dynamic resizing would be useful. The simplest way to achieve this is to have
separate address spaces for each hash function, so that we may
increase/decrease accuracy by simply loading/unloading another function (we
could even do interesting stuff in future like alternating the functions we
select if we find we're getting more false positives than should be expected);
2) Faster loading/unloading would help this, and we could achieve this by
mmapping the bloom filter representation on systems that we can mlock.
was:
Investigate various possible improvements to our bloom filters:
1) Dynamic resizing would be useful. There are a few ways this could be
achieved: with some modification, downsampling could be supported; partitioning
the hash functions so that we may select the number of hashes/bits dynamically,
by loading/unloading a given partition; and there are some related data
structures, such as Quotient Filters that support resizing and merging.
2) Faster loading: should be possible to mmap the bloom filter representation
on disk
3) Most ambitious of all, would be to try to reduce the memory requirement of
bloom filters. Golomb Coded Sets
[1|http://algo2.iti.kit.edu/singler/publications/cacheefficientbloomfilters-wea2007.pdf]
are a possibility, as are other compressed hash structures
[2|http://www.it-c.dk/people/pagh/papers/bloom.pdf].
Priority: Minor (was: Major)
Fix Version/s: 3.0
> Dynamic Resize of Bloom Filters
> -------------------------------
>
> Key: CASSANDRA-6633
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6633
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Benedict
> Priority: Minor
> Fix For: 3.0
>
>
> Dynamic resizing would be useful. The simplest way to achieve this is to have
> separate address spaces for each hash function, so that we may
> increase/decrease accuracy by simply loading/unloading another function (we
> could even do interesting stuff in future like alternating the functions we
> select if we find we're getting more false positives than should be expected);
> 2) Faster loading/unloading would help this, and we could achieve this by
> mmapping the bloom filter representation on systems that we can mlock.
--
This message was sent by Atlassian JIRA
(v6.2#6252)