[
https://issues.apache.org/jira/browse/HBASE-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12666329#action_12666329
]
stack commented on HBASE-1118:
------------------------------
Looking at this a little, the setup of the Scanner is taking up a good portion
of the time returning values. Profiler shows its taking 30-40% of setup time
fetching 100 (small cell) rows. To verify the profiler findings, I resorted to
system.out and that seemed to show similiar figures (though maybe its more than
30-40% since my system.out was measuring serverside while time was taken on
client side after rows had been fetched and emitted on console).
Every time we open a scanner, it opens a Reader per covered HStoreFiles.
Opening a Reader currently means opening the data file and its index plus
reading in the index into memory. This latter seemed to be taking the bulk of
the open time in profiler.
There are a few things we can do here but probably not till tfile time.
1. We already have an open Reader for every HStoreFile. Scanners should be
able to access already-opened Reader indices rather than read in its own. Will
save on startup time and on heap (Indexes are private in current MapFile).
2. A smarter blockcache would let Scanners use already loaded blocks. Chatting
with jgray, since we can give tfile a Stream, the Stream we hand it can be
smartened up so it goes to a blockcache first and if no block, only then to
hdfs.
> Scanner setup takes too long
> ----------------------------
>
> Key: HBASE-1118
> URL: https://issues.apache.org/jira/browse/HBASE-1118
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: stack
>
> posix4 and dj_ryan are on about scanner setups take too long. Use case is
> fetch of a 100 - 1000 rows at a time.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.