[
https://issues.apache.org/jira/browse/HDFS-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977544#action_12977544
]
Jay Booth commented on HDFS-918:
--------------------------------
Hey all, sorry for the slow response, been swamped with the new year and all.
RE: unit tests, at one point it was passing all tests, not sure if the tests
changed or this changed but I can take a look at it.
RE: 0.23, I can look at forward porting this again, but a lot of changes have
gone in since then.
@stack, were you testing the "only pooling" patch or the "with full
multiplexing" patch?
"Only pooling" would be much simpler to forward port, although I do think that
the full multiplexing patch is pretty worthwhile. Aside from the
small-but-significant performance gain, it was IMO much better factoring to
have the DN-side logic all encapsulated in a Connection object which has
sendPacket() repeatedly called, rather than a giant procedural loop that goes
down and back up through several classes. The architecture also made keepalive
pretty straightforward.. just throw that connection back into a listening pool
when done, and make corresponding changes on client side. But, I guess that
logic's been revised now anyways, so it'd be a significant piece of work to
bring it all back up to date.
> Use single Selector and small thread pool to replace many instances of
> BlockSender for reads
> --------------------------------------------------------------------------------------------
>
> Key: HDFS-918
> URL: https://issues.apache.org/jira/browse/HDFS-918
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: data-node
> Reporter: Jay Booth
> Assignee: Jay Booth
> Fix For: 0.22.0
>
> Attachments: hbase-hdfs-benchmarks.ods, hdfs-918-20100201.patch,
> hdfs-918-20100203.patch, hdfs-918-20100211.patch, hdfs-918-20100228.patch,
> hdfs-918-20100309.patch, hdfs-918-branch20-append.patch,
> hdfs-918-branch20.2.patch, hdfs-918-pool.patch, hdfs-918-TRUNK.patch,
> hdfs-multiplex.patch
>
>
> Currently, on read requests, the DataXCeiver server allocates a new thread
> per request, which must allocate its own buffers and leads to
> higher-than-optimal CPU and memory usage by the sending threads. If we had a
> single selector and a small threadpool to multiplex request packets, we could
> theoretically achieve higher performance while taking up fewer resources and
> leaving more CPU on datanodes available for mapred, hbase or whatever. This
> can be done without changing any wire protocols.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.