[ 
https://issues.apache.org/jira/browse/CASSANDRA-790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833927#action_12833927
 ] 

Jonathan Ellis commented on CASSANDRA-790:
------------------------------------------

patch 01 comments:

refactoring:
 - please split the refactoring into a separate patch; it's hard to tell what 
is part of the actual fix here
 - BF constructors that do not chain is a design smell; one of them only being 
called from tests is also a smell
 - instead of using min/max to force values into acceptable ranges, assert that 
they are sane
 - I feel part of the BF problems here is that BF is trying to be too 
high-level.  Wouldn't we be better served by having a low-level BF constructor 
taking hash & bucket counts, and then factories to do the high level things?

fix:
 - this feels like we're trading an obvious problem (BF constructor throws) for 
a more subtle one (BF is a no-op when we exceed the spec, as noted by TODO).  
wouldn't it be better to log a warning, create the largest BF possible, and 
degrade gracefully?  This would be easier if the BF constructor were sane as 
mentioned above.

> SSTables limited to (2^31)/15 keys
> ----------------------------------
>
>                 Key: CASSANDRA-790
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-790
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5, 0.6, 0.7
>            Reporter: Stu Hood
>            Priority: Blocker
>             Fix For: 0.5, 0.6, 0.7
>
>         Attachments: 
> 0001-Change-parameters-to-BloomCalculations-in-order-to-c.patch, 
> 0002-Add-timeouts-to-forceBlockingFlush-during-tests.patch
>
>
> The current BloomFilter implementation requires a BitSet of (bucket_count * 
> num_keys) in size, and that calculation is currently performed in an integer, 
> which causes overflow for around 140 million keys in one SSTable.
> Short term fix: perform the calculation in a long, and cap the value to the 
> maximum size of a BitSet.
> Long term fix: begin partitioning BitSets, perhaps using Linear Bloom Filters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to