[jira] [Commented] (HBASE-2794) ROWCOL bloom filter not used if multiple columns within same family are requested in a Get

[email protected] (Commented) (JIRA) Thu, 29 Sep 2011 14:06:10 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117615#comment-13117615
 ]


[email protected] commented on HBASE-2794:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2084/
-----------------------------------------------------------

(Updated 2011-09-29 21:05:20.334849)


Review request for hbase.


Changes
-------

Addressing Jonathan's comments.


Summary
-------

Previously we only used row-column Bloom filters for scans that only requested 
one column. We have seen production queries that request up to 200 columns, and 
with say ~6 store files per store (region / column family combination) this 
might have resulted in 1200 block read operations in the worst case. With this 
diff we will be avoiding seeks on store files that we know don't contain the 
row/column of interest when using an ExplicitColumnTracker. The performance 
should remain the same for column range queries.


This addresses bug HBASE-2794.
    https://issues.apache.org/jira/browse/HBASE-2794


Diffs (updated)
-----

  src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8 
  src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java 
f5173c4 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java a3d778e 
  
src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 7cbdb98 
  src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 9d9895c 
  src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java 
6cdada7 
  src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 4aa72de 
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 
68cdac5 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java fd9e7ef 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 
08d3ba4 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java ac2348e 
  src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 
32f88fb 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 
a5d13f7 
  
src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java 
baee696 
  
src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/2084/diff


Testing
-------

Existing unit tests. A new unit test (TestScanWithBloomError). Load testing 
using HBaseTest.


Thanks,

Mikhail


                
> ROWCOL bloom filter not used if multiple columns within same family are 
> requested in a Get
> ------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2794
>                 URL: https://issues.apache.org/jira/browse/HBASE-2794
>             Project: HBase
>          Issue Type: Improvement
>          Components: performance
>            Reporter: Kannan Muthukkaruppan
>             Fix For: 0.92.0
>
>
> Noticed the following snippet in StoreFile.java:Scanner:shouldSeek():
> {code}
>         switch(bloomFilterType) {
>           case ROW:
>             key = row;
>             break;
>           case ROWCOL:
>             if (columns.size() == 1) {
>               byte[] col = columns.first();
>               key = Bytes.add(row, col);
>               break;
>             }
>             //$FALL-THROUGH$
>           default:
>             return true;
>         }
> {code}
> If columns.size > 1, then we currently don't take advantage of the bloom 
> filter.  We should optimize this to check bloom for each of columns and if 
> none of the columns are present in the bloom avoid opening the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-2794) ROWCOL bloom filter not used if multiple columns within same family are requested in a Get

Reply via email to