[ 
https://issues.apache.org/jira/browse/HADOOP-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12620380#action_12620380
 ] 

rangadi edited comment on HADOOP-3856 at 8/6/08 12:00 PM:
---------------------------------------------------------------

{{net/MinaEchoServer.java}} in the attached patch is a demo for something 
Hadoop can use initially. Here the channel is created by user code but I/O is 
handled by Mina. 

The 5 files in net/mina are essentially same as the files with the same names 
under {{org.apache.mina.transport.socket.nio}}. These have very minor 
modifications (search for "HADOOP" in the files). 

For this hack I had to modify the files rather than extending them since these 
are either marked 'final' or not public. I am not sure of policy behind public 
and package private stuff in Mina. Some very useful things are public and some 
are not. 

I think MINA is more of a 'server framework' rather than an 'NIO framework'.. 
it is so close to being both.

Patch demos one of the features Hadoop would like. Though this is a hack, we 
can use this now to cut number of threads while writing data to HDFS by half 
(both at the client and at the datanode).

To run the patch, copy mina-core-2.0.0-M2.jar from 
[mina-2.0.0-M2.tar.gz|http://mina.apache.org/dyn/closer.cgi/mina/2.0.0-M2/mina-2.0.0-M2.tar.gz]
 to trunk/lib. Then apply the patch, and run 'ant' and {{bin/hadoop 
org.apache.hadoop.net.MinaEchoServer}}.

Ankur, I am still wondering about the next steps. There does not seem to be 
much interest on MINA side. So it might be better to make changes _then_ 
persuade them. But Hadoop might prefer to get the features in MINA first then 
start using in here. Its could be a bit of stalemate. Also I don't see much 
enthusiasm or priority for this on Hadoop side... may be the user base affected 
by the change needs to expand more, I am not sure. Mean while 
things like HADOOP-3859 keep the things up for a little longer.

Edit : minor


      was (Author: rangadi):
    {{net/MinaEchoServer.java}} in the attached patch is a demo for something 
Hadoop can use initially. Here the channel created by user code but I/O is 
handled by Mina. 

The 5 files in net/mina are essentially same as the files with the same names 
under {{org.apache.mina.transport.socket.nio}}. These have very minor 
modifications (search for "HADOOP" in the files). 

For this hack I had to modify the files rather than extending them since these 
are either marked 'final' or not public. I am not sure of policy behind public 
and package private stuff in Mina. Some very useful things are public and some 
are not. 

I think MINA is more of a 'server framework' rather than an 'NIO framework'.. 
it is so close to being both.

This is one of the features Hadoop would like. Though this is a hack, we can 
use this now to cut number of threads while writing data to HDFS by half (both 
at the client and at the datanode).

To run the patch, copy mina-core-2.0.0-M2.jar from 
[mina-2.0.0-M2.tar.gz|http://mina.apache.org/dyn/closer.cgi/mina/2.0.0-M2/mina-2.0.0-M2.tar.gz]
 to trunk/lib. Then apply the patch, and run 'ant' and {{bin/hadoop 
org.apache.hadoop.net.MinaEchoServer}}.

Ankur, I am still wondering about the next steps. There does not seem to be 
much interest on MINA side. So it might be better to make changes _then_ 
persuade them. But Hadoop might prefer to get the features in MINA first then 
start using in here. Its could be a bit of stalemate. Also I don't see much 
enthusiasm or priority for this on Hadoop side... may be the user base affected 
by the change needs to expand more, I am not sure. Mean while things like 
HADOOP-3859 keep the things up for little longer.

  
> Asynchronous IO Handling in Hadoop and HDFS
> -------------------------------------------
>
>                 Key: HADOOP-3856
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3856
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs, io
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: MinaEchoServer.patch
>
>
> I think Hadoop needs utilities or framework to make it simpler to deal with 
> generic asynchronous IO in  Hadoop.
> Example use case :
> Its been a long standing problem that DataNode takes too many threads for 
> data transfers. Each write operation takes up 2 threads at each of the 
> datanodes and each read operation takes one irrespective of how much activity 
> is on the sockets. The kinds of load that HDFS serves has been expanding 
> quite fast and HDFS should handle these varied loads better. If there is a 
> framework for non-blocking IO, read and write pipeline state machines could 
> be implemented with async events on a fixed number of threads. 
> A generic utility is better since it could be used in other places like 
> DFSClient. DFSClient currently creates 2 extra threads for each file it has 
> open for writing.
> Initially I started writing a primitive "selector", then tried to see if such 
> facility already exists. [Apache MINA|http://mina.apache.org] seemed to do 
> exactly this. My impression after looking the the interface and examples is 
> that it does not give kind control we might prefer or need.  First use case I 
> was thinking of implementing using MINA was to replace "response handlers" in 
> DataNode. The response handlers are simpler since they don't involve disk 
> I/O. I [asked on MINA user 
> list|http://www.nabble.com/Async-events-with-existing-NIO-sockets.-td18640767.html],
>  but looks like it can not be done, I think mainly because the sockets are 
> already created.
> Essentially what I have in mind is similar to MINA, except that read and 
> write of the sockets is done by the event handlers. The lowest layer 
> essentially invokes selectors, invokes event handlers on single or on 
> multiple threads. Each event handler is is expected to do some non-blocking 
> work. We would of course have utility handler implementations that do  read, 
> write, accept etc, that are useful for simple processing.
> Sam Pullara mentioned that [xSockets|http://xsocket.sourceforge.net/] is more 
> flexible. It is under GPL.
> Are there other such implementations we should look at?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to