[jira] [Commented] (HBASE-7885) bloom filter compaction is too aggressive for Hfile which only contains small count of records

clockfly (JIRA) Mon, 25 Feb 2013 21:22:16 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-7885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13586785#comment-13586785
 ]


clockfly commented on HBASE-7885:
---------------------------------

Hi Ted,

+      while ( (newByteSize & 1) == 0 && newMaxKeys > (this.keyCount<<1) 
+          && newByteSize >= MIN_BLOOMFILTER_SIZE * 2) {
         pieces <<= 1;
         newByteSize >>= 1;
         newMaxKeys >>= 1;
   }

In the while loop, we will cut the size by half. After compaction, newByteSize  
will be reduced to newByteSize /2. newByteSize >= MIN_BLOOMFILTER_SIZE * 2 is 
to make sure after compaction, the bloom filter's size is still >= 
MIN_BLOOMFILTER_SIZE.

There are UT affected, I will attach the UT fix soon.

                
> bloom filter compaction is too aggressive for Hfile which only contains small 
> count of records
> ----------------------------------------------------------------------------------------------
>
>                 Key: HBASE-7885
>                 URL: https://issues.apache.org/jira/browse/HBASE-7885
>             Project: HBase
>          Issue Type: Bug
>          Components: Performance, Scanners
>    Affects Versions: 0.94.5
>            Reporter: clockfly
>            Priority: Minor
>             Fix For: 0.94.5
>
>         Attachments: hbase_bloom_shrink_fix.patch
>
>
> For HFile V2, the bloom filter will take a initial size, 128KB. 
> When there are not that much records inserted into the bloom filter, the 
> bloom fitler will start to shrink itself to do compaction. 
> For example, for 128K, it will compact to 64K 
> ->32K->16K->8K->4K->2K->1K->512->256->128->64->32, as long as it think that 
> it can be bounded by the estimate error rate. 
> If we puts only a few records in the HFile, the bloom filter will be 
> compacted to too small, then it will break the assumption that shrinking will 
> still be bounded by the estimated error rate. The False positive rate will 
> becomes un-acceptable high. 
> For example, if we set the expected error rate is 0.00001, for 10 records, 
> after compaction, The size of the bloom filter will be 64 bytes. The real 
> effective false positive rate will be 50%.
> The use case is like this, if we are using HBase to store big record like 
> images, and binaries, each record will take megabytes. Then for a 128M file, 
> it will only contains dozens of records.
> The suggested fix is to set a lower limit for the bloom filter compaction 
> process. I suggest to use 1000 bytes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-7885) bloom filter compaction is too aggressive for Hfile which only contains small count of records

Reply via email to