[
https://issues.apache.org/jira/browse/HBASE-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13117517#comment-13117517
]
[email protected] commented on HBASE-2794:
------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2084/#review2161
-----------------------------------------------------------
src/main/java/org/apache/hadoop/hbase/KeyValue.java
<https://reviews.apache.org/r/2084/#comment5035>
I was implying that "this" is also a method argument when I wrote this
comment. I will edit this to make it clearer.
src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
<https://reviews.apache.org/r/2084/#comment5036>
Yes, I will modify the javadoc of this method.
- Mikhail
On 2011-09-28 16:03:52, Mikhail Bautin wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/2084/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-09-28 16:03:52)
bq.
bq.
bq. Review request for hbase.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. Previously we only used row-column Bloom filters for scans that only
requested one column. We have seen production queries that request up to 200
columns, and with say ~6 store files per store (region / column family
combination) this might have resulted in 1200 block read operations in the
worst case. With this diff we will be avoiding seeks on store files that we
know don't contain the row/column of interest when using an
ExplicitColumnTracker. The performance should remain the same for column range
queries.
bq.
bq.
bq. This addresses bug HBASE-2794.
bq. https://issues.apache.org/jira/browse/HBASE-2794
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
08d3ba4
bq. src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
ac2348e
bq. src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 4aa72de
bq. src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
68cdac5
bq. src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
fd9e7ef
bq. src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java
9d9895c
bq. src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java
6cdada7
bq. src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 7cbdb98
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java
PRE-CREATION
bq. src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8
bq. src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
f5173c4
bq. src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java a3d778e
bq. src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java
32f88fb
bq. src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java
a5d13f7
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java
baee696
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java
PRE-CREATION
bq.
bq. Diff: https://reviews.apache.org/r/2084/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. Existing unit tests. A new unit test (TestScanWithBloomError). Load
testing using HBaseTest.
bq.
bq.
bq. Thanks,
bq.
bq. Mikhail
bq.
bq.
> ROWCOL bloom filter not used if multiple columns within same family are
> requested in a Get
> ------------------------------------------------------------------------------------------
>
> Key: HBASE-2794
> URL: https://issues.apache.org/jira/browse/HBASE-2794
> Project: HBase
> Issue Type: Improvement
> Components: performance
> Reporter: Kannan Muthukkaruppan
> Fix For: 0.92.0
>
>
> Noticed the following snippet in StoreFile.java:Scanner:shouldSeek():
> {code}
> switch(bloomFilterType) {
> case ROW:
> key = row;
> break;
> case ROWCOL:
> if (columns.size() == 1) {
> byte[] col = columns.first();
> key = Bytes.add(row, col);
> break;
> }
> //$FALL-THROUGH$
> default:
> return true;
> }
> {code}
> If columns.size > 1, then we currently don't take advantage of the bloom
> filter. We should optimize this to check bloom for each of columns and if
> none of the columns are present in the bloom avoid opening the file.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira