[
https://issues.apache.org/jira/browse/HADOOP-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Raghu Angadi updated HADOOP-3856:
---------------------------------
Attachment: MinaEchoServer.patch
{{net/MinaEchoServer.java}} in the attached patch is a demo for something
Hadoop can use initially. Here the channel created by user code but I/O is
handled by Mina.
The 5 files in net/mina are essentially same as the files with the same names
under {{org.apache.mina.transport.socket.nio}}. These have very minor
modifications (search for "HADOOP" in the files).
For this hack I had to modify the files rather than extending them since these
are either marked 'final' or not public. I am not sure of policy behind public
and package private stuff in Mina. Some very useful things are public and some
are not.
I think MINA is more of a 'server framework' rather than an 'NIO framework'..
it is so close to being both.
This is one of the features Hadoop would like. Though this is a hack, we can
use this now to cut number of threads while writing data to HDFS by half (both
at the client and at the datanode).
To run the patch, copy mina-core-2.0.0-M2.jar from
[mina-2.0.0-M2.tar.gz|http://mina.apache.org/dyn/closer.cgi/mina/2.0.0-M2/mina-2.0.0-M2.tar.gz]
to trunk/lib. Then apply the patch, and run 'ant' and {{bin/hadoop
org.apache.hadoop.net.MinaEchoServer}}.
Ankur, I am still wondering about the next steps. There does not seem to be
much interest on MINA side. So it might be better to make changes _then_
persuade them. But Hadoop might prefer to get the features in MINA first then
start using in here. Its could be a bit of stalemate. Also I don't see much
enthusiasm or priority for this on Hadoop side... may be the user base affected
by the change needs to expand more, I am not sure. Mean while things like
HADOOP-3859 keep the things up for little longer.
> Asynchronous IO Handling in Hadoop and HDFS
> -------------------------------------------
>
> Key: HADOOP-3856
> URL: https://issues.apache.org/jira/browse/HADOOP-3856
> Project: Hadoop Core
> Issue Type: New Feature
> Components: dfs, io
> Reporter: Raghu Angadi
> Assignee: Raghu Angadi
> Attachments: MinaEchoServer.patch
>
>
> I think Hadoop needs utilities or framework to make it simpler to deal with
> generic asynchronous IO in Hadoop.
> Example use case :
> Its been a long standing problem that DataNode takes too many threads for
> data transfers. Each write operation takes up 2 threads at each of the
> datanodes and each read operation takes one irrespective of how much activity
> is on the sockets. The kinds of load that HDFS serves has been expanding
> quite fast and HDFS should handle these varied loads better. If there is a
> framework for non-blocking IO, read and write pipeline state machines could
> be implemented with async events on a fixed number of threads.
> A generic utility is better since it could be used in other places like
> DFSClient. DFSClient currently creates 2 extra threads for each file it has
> open for writing.
> Initially I started writing a primitive "selector", then tried to see if such
> facility already exists. [Apache MINA|http://mina.apache.org] seemed to do
> exactly this. My impression after looking the the interface and examples is
> that it does not give kind control we might prefer or need. First use case I
> was thinking of implementing using MINA was to replace "response handlers" in
> DataNode. The response handlers are simpler since they don't involve disk
> I/O. I [asked on MINA user
> list|http://www.nabble.com/Async-events-with-existing-NIO-sockets.-td18640767.html],
> but looks like it can not be done, I think mainly because the sockets are
> already created.
> Essentially what I have in mind is similar to MINA, except that read and
> write of the sockets is done by the event handlers. The lowest layer
> essentially invokes selectors, invokes event handlers on single or on
> multiple threads. Each event handler is is expected to do some non-blocking
> work. We would of course have utility handler implementations that do read,
> write, accept etc, that are useful for simple processing.
> Sam Pullara mentioned that [xSockets|http://xsocket.sourceforge.net/] is more
> flexible. It is under GPL.
> Are there other such implementations we should look at?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.