[
https://issues.apache.org/jira/browse/HBASE-6577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477379#comment-13477379
]
Lars Hofhansl commented on HBASE-6577:
--------------------------------------
This just came up on the mailing list again:
{code}
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$EncodedScannerV2.loadBlockAndSeekToKey(HFileReaderV2.java:1027)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:461)
at
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:493)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:242)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:167)
at
org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:54)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:521)
- locked <0x000000059584fab8> (a
org.apache.hadoop.hbase.regionserver.StoreScanner)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:402)
- locked <0x000000059584fab8> (a
org.apache.hadoop.hbase.regionserver.StoreScanner)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRow(HRegion.java:3507)
at
...
{code}
zahoor mentioned there that his KVs have very many version 1500+.
Presumably each new column (likely) starts on a new (HBase) block, because of
the many versions, which is why we see a lot of seeking.
I wonder whether a solution like the following would work:
In HRegionScannerImpl.nextRow(...) we try the current "naive" iteration for N
KVs (let's say 100). If by then we have not reached the next row, we'll issue a
direct seek.
That way if there are few version we avoid unnecessary seeks, but with many
version we can seek past a lot of KVs (and thus also avoid unnecessary seeks).
I can make a patch for that.
[~jdcryans] Would you be able the recreate the issue you saw with the initial
version of this patch in production?
> RegionScannerImpl.nextRow() should seek to next row
> ---------------------------------------------------
>
> Key: HBASE-6577
> URL: https://issues.apache.org/jira/browse/HBASE-6577
> Project: HBase
> Issue Type: Bug
> Reporter: Lars Hofhansl
> Assignee: Lars Hofhansl
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 6577-0.94.txt, 6577.txt, 6577-v2.txt
>
>
> RegionScannerImpl.nextRow() is called when a filter filters the entire row.
> In that case we should seek to the next row rather then iterating over all
> versions of all columns to get there.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira