[
https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12976304#action_12976304
]
Edward Capriolo commented on HDFS-918:
--------------------------------------
Fundamentally using selectors is more efficient than the one-x-per-request
model. It is not only the random-read of HBase that run into DataXCeiver
issues. For example hive supports dynamic partitioning. A single query may
output to several thousand partitions and I have had to raise DataXCeivers and
/etc/security/limits.conf to account for this. (very frustrating to still be
upping ulimits in the 21st century :)
Also with solid-state-drive technology becoming a bigger part of the
datacenter, the assumption that opening a socket per request is acceptable
because the disk reads will be the bottleneck before the number of sockets on a
system may not be correct for long.
It looks to be a win for many use-cases and should not be significant to
standard map-reduce use cases. What do we have to do to get this patch to go
+1?
> Use single Selector and small thread pool to replace many instances of
> BlockSender for reads
> --------------------------------------------------------------------------------------------
>
> Key: HDFS-918
> URL: https://issues.apache.org/jira/browse/HDFS-918
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: data-node
> Reporter: Jay Booth
> Assignee: Jay Booth
> Fix For: 0.22.0
>
> Attachments: hbase-hdfs-benchmarks.ods, hdfs-918-20100201.patch,
> hdfs-918-20100203.patch, hdfs-918-20100211.patch, hdfs-918-20100228.patch,
> hdfs-918-20100309.patch, hdfs-918-branch20-append.patch,
> hdfs-918-branch20.2.patch, hdfs-918-pool.patch, hdfs-918-TRUNK.patch,
> hdfs-multiplex.patch
>
>
> Currently, on read requests, the DataXCeiver server allocates a new thread
> per request, which must allocate its own buffers and leads to
> higher-than-optimal CPU and memory usage by the sending threads. If we had a
> single selector and a small threadpool to multiplex request packets, we could
> theoretically achieve higher performance while taking up fewer resources and
> leaving more CPU on datanodes available for mapred, hbase or whatever. This
> can be done without changing any wire protocols.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.