[ 
https://issues.apache.org/jira/browse/HBASE-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-2180:
-------------------------

    Attachment: 2180.patch

This patch has gets do preads fetching blocks and uses the old seek+read for 
scans.

Patch removes the old HFile.Reader.getScanner methods and replaces both with a 
getScanner that takes two arguments -- whether to cache blocks read and whether 
to use pread or not pulling in the block.  I got rid of the old getScanners to 
force all getScanners to be explicit about what they want regards caching and 
pread.

This patch does not include tests.  Its hard to test for this performance 
change.

A further improvement would recognize short scans -- i.e. scans that are < an 
hfile block size.  In this case, we'd want to pread rather than seek+scan 
(especially so when scan one row replaces get)



> read performance from synchronizing hfile.fddatainputstream
> -----------------------------------------------------------
>
>                 Key: HBASE-2180
>                 URL: https://issues.apache.org/jira/browse/HBASE-2180
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: ryan rawson
>            Assignee: ryan rawson
>             Fix For: 0.21.0
>
>         Attachments: 2180.patch
>
>
> deep in the HFile read path, there is this code:
>     synchronized (in) {
>       in.seek(pos);
>       ret = in.read(b, off, n);
>     }
> this makes it so that only 1 read per file per thread is active. this 
> prevents the OS and hardware from being able to do IO scheduling by 
> optimizing lots of concurrent reads. 
> We need to either use a reentrant API (pread may be partially reentrant 
> according to Todd) or use multiple stream objects, 1 per scanner/thread.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to