[
https://issues.apache.org/jira/browse/HBASE-4433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115353#comment-13115353
]
Hudson commented on HBASE-4433:
-------------------------------
Integrated in HBase-TRUNK #2261 (See
[https://builds.apache.org/job/HBase-TRUNK/2261/])
HBASE-4433 avoid extra next (potentially a seek) if done with column/row
(kannan via jgray)
jgray :
Files :
* /hbase/trunk/CHANGES.txt
*
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
*
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
*
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
*
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java
*
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
*
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestQueryMatcher.java
> avoid extra next (potentially a seek) if done with column/row
> -------------------------------------------------------------
>
> Key: HBASE-4433
> URL: https://issues.apache.org/jira/browse/HBASE-4433
> Project: HBase
> Issue Type: Improvement
> Reporter: Kannan Muthukkaruppan
> Assignee: Kannan Muthukkaruppan
> Fix For: 0.94.0
>
>
> [Noticed this in 89, but quite likely true of trunk as well.]
> When we are done with the requested column(s) the code still does an extra
> next() call before it realizes that it is actually done. This extra next()
> call could potentially result in an unnecessary extra block load. This is
> likely to be especially bad for CFs where the KVs are large blobs where each
> KV may be occupying a block of its own. So the next() can often load a new
> unrelated block unnecessarily.
> --
> For the simple case of reading say the top-most column in a row in a single
> file, where each column (KV) was say a block of its own-- it seems that we
> are reading 3 blocks, instead of 1 block!
> I am working on a simple patch and with that the number of seeks is down to
> 2.
> [There is still an extra seek left. I think there were two levels of
> extra/unnecessary next() we were doing without actually confirming that the
> next was needed. One at the StoreScanner/ScanQueryMatcher level which this
> diff avoids. I think the other is at hfs.next() (at the storefile scanner
> level) that's happening whenever a HFile scanner servers out a data-- and
> perhaps that's the additional seek that we need to avoid. But I want to
> tackle this optimization first as the two issues seem unrelated.]
> --
> The basic idea of the patch I am working on/testing is as follows. The
> ExplicitColumnTracker currently returns "INCLUDE" to the ScanQueryMatcher if
> the KV needs to be included and then if done, only in the the next call it
> returns the appropriate SEEK_NEXT_COL or SEEK_NEXT_ROW hint. For the cases
> when ExplicitColumnTracker knows it is done with a particular column/row, the
> patch attempts to combine the INCLUDE code and done hint into a single match
> code-- INCLUDE_AND_SEEK_NEXT_COL and INCLUDE_AND_SEEK_NEXT_ROW.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira