[jira] Commented: (HBASE-2180) Bad random read performance from synchronizing hfile.fddatainputstream

stack (JIRA) Sat, 27 Feb 2010 05:09:32 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-2180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12839250#action_12839250
 ]


stack commented on HBASE-2180:
------------------------------

.bq I wonder why so many connections are being opened so quickly that the 
server runs out of ports within a few minutes of starting the gets/puts?

Gets used hdfs pread.  pread opens a socket per access.  My guess is that high 
rate of gets soon overwhelms the time each socket takes to clean up after 
close.  What kinda rates are we talking here Erik?

> Bad random read performance from synchronizing hfile.fddatainputstream
> ----------------------------------------------------------------------
>
>                 Key: HBASE-2180
>                 URL: https://issues.apache.org/jira/browse/HBASE-2180
>             Project: Hadoop HBase
>          Issue Type: Bug
>            Reporter: ryan rawson
>            Assignee: stack
>             Fix For: 0.20.4, 0.21.0
>
>         Attachments: 2180-v2.patch, 2180.patch
>
>
> deep in the HFile read path, there is this code:
>     synchronized (in) {
>       in.seek(pos);
>       ret = in.read(b, off, n);
>     }
> this makes it so that only 1 read per file per thread is active. this 
> prevents the OS and hardware from being able to do IO scheduling by 
> optimizing lots of concurrent reads. 
> We need to either use a reentrant API (pread may be partially reentrant 
> according to Todd) or use multiple stream objects, 1 per scanner/thread.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2180) Bad random read performance from synchronizing hfile.fddatainputstream

Reply via email to