[
https://issues.apache.org/jira/browse/HBASE-4532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131813#comment-13131813
]
[email protected] commented on HBASE-4532:
------------------------------------------------------
bq. On 2011-10-20 04:55:44, Jonathan Gray wrote:
bq. > src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java, line
792
bq. > <https://reviews.apache.org/r/2393/diff/3/?file=51375#file51375line792>
bq. >
bq. > is there _ever_ a case someone would not want this turned on? if
someone was doing a ton of delete families maybe? u might not want to pay the
cost of making this bloom.
Yes, We can disable this by
conf.setBoolean(IO_STOREFILE_DELETEFAMILY_BLOOM_ENABLED, false);
bq. On 2011-10-20 04:55:44, Jonathan Gray wrote:
bq. >
src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java, line
98
bq. > <https://reviews.apache.org/r/2393/diff/3/?file=51374#file51374line98>
bq. >
bq. > this means the null qualifier?
yes. I have updated the comments:)
bq. On 2011-10-20 04:55:44, Jonathan Gray wrote:
bq. > src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java, lines
677-678
bq. > <https://reviews.apache.org/r/2393/diff/3/?file=51370#file51370line677>
bq. >
bq. > is this right to return IOE and not null like if it doesn't exist in
the general bloom case?
agreed :)
bq. On 2011-10-20 04:55:44, Jonathan Gray wrote:
bq. >
src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java, lines
67-68
bq. > <https://reviews.apache.org/r/2393/diff/3/?file=51374#file51374line67>
bq. >
bq. > can you describe what an empty column means? does this mean
wildcard or does this mean the null column?
yes. I have updated the comments:)
bq. On 2011-10-20 04:55:44, Jonathan Gray wrote:
bq. > src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java, line
73
bq. > <https://reviews.apache.org/r/2393/diff/3/?file=51378#file51378line73>
bq. >
bq. > this enables the creation or the usage?
This enable the creation.
bq. On 2011-10-20 04:55:44, Jonathan Gray wrote:
bq. > src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java,
line 26
bq. > <https://reviews.apache.org/r/2393/diff/3/?file=51379#file51379line26>
bq. >
bq. > it's hard to tell what in this actually changed. i don't see much
that actually went down? and should you also do some tests where you
enable/disable the delete family bloom to ensure that it's working as expected
both ways?
It expects no number goes down :) It shows we can avoid the top row seek even
there is ROW/NONE bloom filter.
Previously, this unit test only enabled the ROWCOL bloom filter for HBASE-4469
(Avoid top row seek by looking up row_col bloomfilter)
But right now, in the TestBlocksRead, it will check the number seeks for
ROWCOL, ROW and NONE Bloom filter one by one.
No matter what Bloom filter the CF is using, we always avoid the top row seek:)
bq. On 2011-10-20 04:55:44, Jonathan Gray wrote:
bq. > src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java,
line 401
bq. > <https://reviews.apache.org/r/2393/diff/3/?file=51381#file51381line401>
bq. >
bq. > you seem to be setting the conf to 0.01 and then retrieving it back?
Yes. I try to be consistent with other bloom filter unit tests.
So set the same error rate as testBloomFilter() function.
- Liyin
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2393/#review2693
-----------------------------------------------------------
On 2011-10-20 03:46:26, Liyin Tang wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/2393/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2011-10-20 03:46:26)
bq.
bq.
bq. Review request for hbase, Dhruba Borthakur, Michael Stack, Mikhail Bautin,
Pritam Damania, Prakash Khemani, Amitanand Aiyer, Kannan Muthukkaruppan, Jerry
Chen, Liyin Tang, Karthik Ranganathan, and Nicolas Spiegelberg.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. HBASE-4469 avoids the top row seek operation if row-col bloom filter is
enabled.
bq. This jira tries to avoid top row seek for all the cases by creating a
dedicated bloom filter only for delete family
bq.
bq. The only subtle use case is when we are interested in the top row with
empty column.
bq.
bq. For example,
bq. we are interested in row1/cf1:/1/put.
bq. So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family
bloom filter will say there is NO delete family.
bq. Then it will avoid the top row seek and return a fake kv, which is the
last kv for this row (createLastOnRowCol).
bq. In this way, we have already missed the real kv we are interested in.
bq.
bq. The solution for the above problem is to disable this optimization if we
are trying to GET/SCAN a row with empty column.
bq.
bq. This patch is rebased on 0.89-fb. But it should be the same for
apache-trunk as well. I will submit the patch for apache-trunk later.
bq.
bq.
bq. This addresses bug HBASE-4532.
bq. https://issues.apache.org/jira/browse/HBASE-4532
bq.
bq.
bq. Diffs
bq. -----
bq.
bq. src/main/java/org/apache/hadoop/hbase/KeyValue.java 93538bb
bq. src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java 9a79a74
bq. src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 5d9b518
bq. src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
6cf7cce
bq. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
1f78dd4
bq. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
3c34f86
bq. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
2e1d23a
bq. src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
c4b60e9
bq. src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
92070b3
bq. src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
e4dfc2e
bq. src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
ebb360c
bq. src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
8814812
bq. src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java
fb4f2df
bq. src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java
b8bcc65
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
48e9163
bq. src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java
0eca9b8
bq.
bq. Diff: https://reviews.apache.org/r/2393/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq. Passed all the unit tests
bq.
bq.
bq. Thanks,
bq.
bq. Liyin
bq.
bq.
> Avoid top row seek by dedicated bloom filter for delete family bloom filter
> ---------------------------------------------------------------------------
>
> Key: HBASE-4532
> URL: https://issues.apache.org/jira/browse/HBASE-4532
> Project: HBase
> Issue Type: Improvement
> Reporter: Liyin Tang
> Assignee: Liyin Tang
> Attachments: D27.1.patch, D27.1.patch
>
>
> HBASE-4469 avoids the top row seek operation if row-col bloom filter is
> enabled.
> This jira tries to avoid top row seek for all the cases by creating a
> dedicated bloom filter only for delete family
> The only subtle use case is when we are interested in the top row with empty
> column.
> For example,
> we are interested in row1/cf1:/1/put.
> So we seek to the top row: row1/cf1:/MAX_TS/MAXIMUM. And the delete family
> bloom filter will say there is NO delete family.
> Then it will avoid the top row seek and return a fake kv, which is the last
> kv for this row (createLastOnRowCol).
> In this way, we have already missed the real kv we are interested in.
> The solution for the above problem is to disable this optimization if we are
> trying to GET/SCAN a row with empty column.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira