[ 
https://issues.apache.org/jira/browse/CASSANDRA-5906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13830212#comment-13830212
 ] 

Yuki Morishita commented on CASSANDRA-5906:
-------------------------------------------

So far, I tested HLL++ alone for serialized size and error% with various 
parameters. 
https://docs.google.com/a/datastax.com/spreadsheet/ccc?key=0AsVe14L_ijtkdEhDbk1rTjYwb3ZjdXFlTnNCNnk2cGc#gid=13

We can reduce the size from originally posted here (p=16, sp=0), down to less 
than 10k for p=13, sp=25. Using the sparse mode, we can save space for smaller 
number of partitions.
I think relative error 2% of estimated partition size is tolerable for 
constructing bloom filter. (though I don't have formula to prove it :P)


> Avoid allocating over-large bloom filters
> -----------------------------------------
>
>                 Key: CASSANDRA-5906
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5906
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Yuki Morishita
>             Fix For: 2.1
>
>
> We conservatively estimate the number of partitions post-compaction to be the 
> total number of partitions pre-compaction.  That is, we assume the worst-case 
> scenario of no partition overlap at all.
> This can result in substantial memory wasted in sstables resulting from 
> highly overlapping compactions.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to