[
https://issues.apache.org/jira/browse/CASSANDRA-8413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Stupp updated CASSANDRA-8413:
------------------------------------
Attachment: 8413-patch.txt
Attached {{8413-patch.txt}}.
It simply swaps the two {{long}}s used for {{IFilter.add}} and
{{IFilter.isPresent}}. For old sstables (before 3.0 / version {{la}}) the
{{long}}s are not swapped.
Also added/changed a lot of unit tests.
Had to ignore one of the existing tests that looks a bit useless.
> Bloom filter false positive ratio is not honoured
> -------------------------------------------------
>
> Key: CASSANDRA-8413
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8413
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Benedict
> Assignee: Robert Stupp
> Fix For: 3.0
>
> Attachments: 8413-patch.txt, 8413.hack-3.0.txt, 8413.hack.txt
>
>
> Whilst thinking about CASSANDRA-7438 and hash bits, I realised we have a
> problem with sabotaging our bloom filters when using the murmur3 partitioner.
> I have performed a very quick test to confirm this risk is real.
> Since a typical cluster uses the same murmur3 hash for partitioning as we do
> for bloom filter lookups, and we own a contiguous range, we can guarantee
> that the top X bits collide for all keys on the node. This translates into
> poor bloom filter distribution. I quickly hacked LongBloomFilterTest to
> simulate the problem, and the result in these tests is _up to_ a doubling of
> the actual false positive ratio. The actual change will depend on the key
> distribution, the number of keys, the false positive ratio, the number of
> nodes, the token distribution, etc. But seems to be a real problem for
> non-vnode clusters of at least ~128 nodes in size.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)