[ 
https://issues.apache.org/jira/browse/HBASE-1118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12666329#action_12666329
 ] 

stack commented on HBASE-1118:
------------------------------

Looking at this a little, the setup of the Scanner is taking up a good portion 
of the time returning values.  Profiler shows its taking 30-40% of setup time 
fetching 100 (small cell) rows.  To verify the profiler findings, I resorted to 
system.out and that seemed to show similiar figures (though maybe its more than 
30-40% since my system.out was measuring serverside while time was taken on 
client side after rows had been fetched and emitted on console).

Every time we open a scanner, it opens a Reader per covered HStoreFiles.  
Opening a Reader currently means opening the data file and its index plus 
reading in the index into memory.  This latter seemed to be taking the bulk of 
the open time in profiler.

There are a few things we can do here but probably not till tfile time.

1. We already have an open Reader for every HStoreFile.  Scanners should be 
able to access already-opened Reader indices rather than read in its own.  Will 
save on startup time and on heap (Indexes are private in current MapFile).
2. A smarter blockcache would let Scanners use already loaded blocks.  Chatting 
with jgray, since we can give tfile a Stream, the Stream we hand it can be 
smartened up so it goes to a blockcache first and if no block, only then to 
hdfs.





> Scanner setup takes too long
> ----------------------------
>
>                 Key: HBASE-1118
>                 URL: https://issues.apache.org/jira/browse/HBASE-1118
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: stack
>
> posix4 and dj_ryan are on about scanner setups take too long.  Use case is 
> fetch of a 100 - 1000 rows at a time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to