[
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13130883#comment-13130883
]
[email protected] commented on HBASE-4532:
------------------------------------------------------
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2393/#review2615
-----------------------------------------------------------
Hi Liyin-- find the first pass review comments inlined. Haven't reviewed the
test changes yet. Looking fwd to this optimization landing.
src/main/java/org/apache/hadoop/hbase/KeyValue.java
<https://reviews.apache.org/r/2393/#comment5908>
remove "qualifier" from the comment, since all we are passing here is row
and family (no column name).
src/main/java/org/apache/hadoop/hbase/KeyValue.java
<https://reviews.apache.org/r/2393/#comment5909>
* remove qualifier from comment here too.
* 80 char issue
src/main/java/org/apache/hadoop/hbase/KeyValue.java
<https://reviews.apache.org/r/2393/#comment5910>
remove qualifier param.
src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
<https://reviews.apache.org/r/2393/#comment5913>
80 char issues.
src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
<https://reviews.apache.org/r/2393/#comment5914>
can we enhance HFilePrettyPrinter to report info about the
DeleteBloomFilter as well (provided HFile is V2).
src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
<https://reviews.apache.org/r/2393/#comment5985>
even though -> even if
src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
<https://reviews.apache.org/r/2393/#comment5987>
what's the differerence between getPath() (in line 1027) and
this.writer.getPath()? Did you mean to log the general & delete Bloom filter
instead?
src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
<https://reviews.apache.org/r/2393/#comment6015>
not clear where you are using this "-1" state
src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
<https://reviews.apache.org/r/2393/#comment6017>
or this is no -> or there is no
src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
<https://reviews.apache.org/r/2393/#comment6016>
To make sure I understand this...
for "HFileV1" case or for "HFileV2 + but without this fix", I am guessing
deleteFamilyCnt will be equal to -1, and the fact that it doesn't have a
bloomFilter will cause it to return true. That look's fine. Just not obvious.
src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
<https://reviews.apache.org/r/2393/#comment6014>
space between cnt & !=
src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
<https://reviews.apache.org/r/2393/#comment6020>
did you intend to initialize bloomTypeLog here as well?
src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
<https://reviews.apache.org/r/2393/#comment6021>
bloomTypeLog is only initialized for GeneralBloomFilter case. If that's the
intent, why not move the logging near line 1382?
src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
<https://reviews.apache.org/r/2393/#comment6027>
In case there is a deleteFamily kv, there are two sub-cases here...
a) we have ROWCOL bloom (in which case there is no DeleteFamilyBloomFilter)
and we want to use the ROWCOL bloom filter itself.
b) we have a DeleteFamilyBloomFilter.
I don't see us taking advantage of (a) like we used to earlier. Isn't this
a regression for the ROWCOL bloom case? And if so, TestBlocksRead should have
caught it, no?
src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
<https://reviews.apache.org/r/2393/#comment6023>
isSeekToEmptyColumn and useBloom should be separate flags I think.
For example, if the CF had ROWCOL bloom, and the query for looking for
"row/0-length column", then with this change, we won't use the ROWCOL bloom
filter even when it exists.
Isn't it the case that we want to avoid using only the deleteFamilyBloom
filter when isSeekToEmptyColumn is true?
- Kannan
On 2011-10-18 20:38:41, Liyin Tang wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/2393/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-10-18 20:38:41)
bq.
bq.
bq. Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin,
Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry
Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. HBASE-4469 avoids the top row seek operation if row-col bloom filter is
enabled.
bq. This jira tries to avoid top row seek for all the cases by creating a
dedicated bloom filter only for delete family
bq.
bq. The only subtle use case is when we are interested in the top row with
empty column.
bq.
bq. For example,
bq. we are interested in row1/cf1:/1/put.
bq. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family
bloom filter will say there is NO delete family.
bq. Then it will avoid the top row seek and return a fake kv, which is the
last kv for this row (createLastOnRowCol).
bq. In this way, we have already missed the real kv we are interested in.
bq.
bq. The solution for the above problem is to disable this optimization if we
are trying to GET/SCAN a row with empty column.
bq.
bq. This patch is rebased on 0.89-fb. But it should be the same for
apache-trunk as well. I will submit the patch for apache-trunk later.
bq.
bq.
bq. This addresses bug HBASE-4532.
bq. https://issues.apache.org/jira/browse/HBASE-4532
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb
bq. src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74
bq. src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518
bq. src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
6cf7cce
bq. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
1f78dd4
bq. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
3c34f86
bq. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
2e1d23a
bq. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
c4b60e9
bq. src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
a1d7de5
bq. src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
e4dfc2e
bq. src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
ebb360c
bq. src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
8814812
bq. src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java
fb4f2df
bq. src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java
c88b23f
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
48e9163
bq. src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
0eca9b8
bq.
bq. Diff: https://reviews.apache.org/r/2393/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. Running all the unit tests now
bq.
bq.
bq. Thanks,
bq.
bq. Liyin
bq.
bq.
> Avoid top row seek by dedicated bloom filter for delete family bloom filter
> ---------------------------------------------------------------------------
>
> Key: HBASE-4532
> URL: https://issues.apache.org/jira/browse/HBASE-4532
> Project: HBase
> Issue Type: Improvement
> Reporter: Liyin Tang
> Assignee: Liyin Tang
> Attachments: D27.1.patch, D27.1.patch
>
>
> HBASE-4469 avoids the top row seek operation if row-col bloom filter is
> enabled.
> This jira tries to avoid top row seek for all the cases by creating a
> dedicated bloom filter only for delete family
> The only subtle use case is when we are interested in the top row with empty
> column.
> For example,
> we are interested in row1/cf1:/1/put.
> So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family
> bloom filter will say there is NO delete family.
> Then it will avoid the top row seek and return a fake kv, which is the last
> kv for this row (createLastOnRowCol).
> In this way, we have already missed the real kv we are interested in.
> The solution for the above problem is to disable this optimization if we are
> trying to GET/SCAN a row with empty column.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira