[ 
https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12829368#action_12829368
 ] 

ryan rawson commented on HDFS-918:
----------------------------------

the problem was we were using a stateful interface previously because it was 
faster in scan tests, so we serialized reads within 1 RS to any given HFile.  
With multiple client handler threads asking for different parts of a large 
file, we get a serialized behaviour which hurts random get performance.

So we are moving back to pread, which means we will get more parallelism - 
depending your table read pattern of course.  But I want to get even more 
parallelism, by preading multiple hfiles during a scan/get for example.  This 
will just up the thread pressure on the datanode.

> Use single Selector and small thread pool to replace many instances of 
> BlockSender for reads
> --------------------------------------------------------------------------------------------
>
>                 Key: HDFS-918
>                 URL: https://issues.apache.org/jira/browse/HDFS-918
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node
>            Reporter: Jay Booth
>             Fix For: 0.22.0
>
>         Attachments: hdfs-918-20100201.patch, hdfs-918-20100203.patch, 
> hdfs-multiplex.patch
>
>
> Currently, on read requests, the DataXCeiver server allocates a new thread 
> per request, which must allocate its own buffers and leads to 
> higher-than-optimal CPU and memory usage by the sending threads.  If we had a 
> single selector and a small threadpool to multiplex request packets, we could 
> theoretically achieve higher performance while taking up fewer resources and 
> leaving more CPU on datanodes available for mapred, hbase or whatever.  This 
> can be done without changing any wire protocols.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to