[
https://issues.apache.org/jira/browse/CASSANDRA-5906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13830212#comment-13830212
]
Yuki Morishita commented on CASSANDRA-5906:
-------------------------------------------
So far, I tested HLL++ alone for serialized size and error% with various
parameters.
https://docs.google.com/a/datastax.com/spreadsheet/ccc?key=0AsVe14L_ijtkdEhDbk1rTjYwb3ZjdXFlTnNCNnk2cGc#gid=13
We can reduce the size from originally posted here (p=16, sp=0), down to less
than 10k for p=13, sp=25. Using the sparse mode, we can save space for smaller
number of partitions.
I think relative error 2% of estimated partition size is tolerable for
constructing bloom filter. (though I don't have formula to prove it :P)
> Avoid allocating over-large bloom filters
> -----------------------------------------
>
> Key: CASSANDRA-5906
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5906
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Jonathan Ellis
> Assignee: Yuki Morishita
> Fix For: 2.1
>
>
> We conservatively estimate the number of partitions post-compaction to be the
> total number of partitions pre-compaction. That is, we assume the worst-case
> scenario of no partition overlap at all.
> This can result in substantial memory wasted in sstables resulting from
> highly overlapping compactions.
--
This message was sent by Atlassian JIRA
(v6.1#6144)