[jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads

Edward Capriolo (JIRA) Fri, 31 Dec 2010 09:51:14 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976304#action_12976304
 ]


Edward Capriolo commented on HDFS-918:
--------------------------------------

Fundamentally using selectors is more efficient than the one-x-per-request 
model. It is not only the random-read of HBase that run into DataXCeiver 
issues.  For example hive supports dynamic partitioning. A single query may 
output to several thousand partitions and I have had to raise DataXCeivers and 
/etc/security/limits.conf to account for this. (very frustrating to still be 
upping ulimits in the 21st century :)

Also with solid-state-drive technology becoming a bigger part of the 
datacenter, the assumption that opening a socket per request is acceptable 
because the disk reads will be the bottleneck before the number of sockets on a 
system may not be correct for long.

It looks to be a win for many use-cases and should not be significant to 
standard map-reduce use cases.  What do we have to do to get this patch to go 
+1? 

> Use single Selector and small thread pool to replace many instances of 
> BlockSender for reads
> --------------------------------------------------------------------------------------------
>
>                 Key: HDFS-918
>                 URL: https://issues.apache.org/jira/browse/HDFS-918
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node
>            Reporter: Jay Booth
>            Assignee: Jay Booth
>             Fix For: 0.22.0
>
>         Attachments: hbase-hdfs-benchmarks.ods, hdfs-918-20100201.patch, 
> hdfs-918-20100203.patch, hdfs-918-20100211.patch, hdfs-918-20100228.patch, 
> hdfs-918-20100309.patch, hdfs-918-branch20-append.patch, 
> hdfs-918-branch20.2.patch, hdfs-918-pool.patch, hdfs-918-TRUNK.patch, 
> hdfs-multiplex.patch
>
>
> Currently, on read requests, the DataXCeiver server allocates a new thread 
> per request, which must allocate its own buffers and leads to 
> higher-than-optimal CPU and memory usage by the sending threads.  If we had a 
> single selector and a small threadpool to multiplex request packets, we could 
> theoretically achieve higher performance while taking up fewer resources and 
> leaving more CPU on datanodes available for mapred, hbase or whatever.  This 
> can be done without changing any wire protocols.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-918) Use single Selector and small thread pool to replace many instances of BlockSender for reads

Reply via email to