[jira] [Commented] (CASSANDRA-8413) Bloom filter false positive ratio is not honoured

Robert Stupp (JIRA) Sat, 14 Mar 2015 11:20:27 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14361948#comment-14361948
 ]


Robert Stupp commented on CASSANDRA-8413:
-----------------------------------------

[~benedict] I've "rebased" your hack against current trunk. When I execute 
{{org.apache.cassandra.utils.LongBloomFilterTest#main}}, I constantly get 
messages like
{noformat}
ERROR 18:14:07 LEAK DETECTED: a reference 
(org.apache.cassandra.utils.concurrent.Ref$State@73febe86) to class 
org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$1@773173674:[org.apache.cassandra.utils.obs.OpenBitSet@aa6f65cf]
 was not released before the reference was garbage collected
{noformat}


> Bloom filter false positive ratio is not honoured
> -------------------------------------------------
>
>                 Key: CASSANDRA-8413
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8413
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Benedict
>            Assignee: Robert Stupp
>             Fix For: 2.1.4
>
>         Attachments: 8413.hack-3.0.txt, 8413.hack.txt
>
>
> Whilst thinking about CASSANDRA-7438 and hash bits, I realised we have a 
> problem with sabotaging our bloom filters when using the murmur3 partitioner. 
> I have performed a very quick test to confirm this risk is real.
> Since a typical cluster uses the same murmur3 hash for partitioning as we do 
> for bloom filter lookups, and we own a contiguous range, we can guarantee 
> that the top X bits collide for all keys on the node. This translates into 
> poor bloom filter distribution. I quickly hacked LongBloomFilterTest to 
> simulate the problem, and the result in these tests is _up to_ a doubling of 
> the actual false positive ratio. The actual change will depend on the key 
> distribution, the number of keys, the false positive ratio, the number of 
> nodes, the token distribution, etc. But seems to be a real problem for 
> non-vnode clusters of at least ~128 nodes in size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8413) Bloom filter false positive ratio is not honoured

Reply via email to