[jira] Commented: (HBASE-2794) ROWCOL bloom filter not used if multiple columns within same family are requested in a Get

HBase Review Board (JIRA) Mon, 12 Jul 2010 10:22:19 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12887433#action_12887433
 ]


HBase Review Board commented on HBASE-2794:
-------------------------------------------

Message from: "Nicolas" <[email protected]>

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.hbase.org/r/296/#review350
-----------------------------------------------------------



/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
<http://review.hbase.org/r/296/#comment1468>

    have you done any tests to see when the number of bloom checks takes 
significant time compared to just getting the block?  For example, if you have 
100 columns to lookup, do bloom filters really buy you anything, or shouldn't 
you just switch to a Row-level bloom anyways?  Also, with a default 1% error 
rate, you're looking at ~100% false positive with 100 columns.  Maybe 
max.columns = sqrt(1/error.rate)



/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
<http://review.hbase.org/r/296/#comment1463>

    probably should pre-allocate the ArrayList() size so we only deal with one 
heap element.


- Nicolas





> ROWCOL bloom filter not used if multiple columns within same family are 
> requested in a Get
> ------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2794
>                 URL: https://issues.apache.org/jira/browse/HBASE-2794
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Kannan Muthukkaruppan
>         Attachments: 2794_multi_column_check.txt
>
>
> Noticed the following snippet in StoreFile.java:Scanner:shouldSeek():
> {code}
>         switch(bloomFilterType) {
>           case ROW:
>             key = row;
>             break;
>           case ROWCOL:
>             if (columns.size() == 1) {
>               byte[] col = columns.first();
>               key = Bytes.add(row, col);
>               break;
>             }
>             //$FALL-THROUGH$
>           default:
>             return true;
>         }
> {code}
> If columns.size > 1, then we currently don't take advantage of the bloom 
> filter.  We should optimize this to check bloom for each of columns and if 
> none of the columns are present in the bloom avoid opening the file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2794) ROWCOL bloom filter not used if multiple columns within same family are requested in a Get

Reply via email to