Asynchronous IO Handling in Hadoop and HDFS
-------------------------------------------

                 Key: HADOOP-3856
                 URL: https://issues.apache.org/jira/browse/HADOOP-3856
             Project: Hadoop Core
          Issue Type: New Feature
          Components: dfs, io
            Reporter: Raghu Angadi


I think Hadoop needs utilities or framework to make it simpler to deal with 
generic asynchronous IO in  Hadoop.

Example use case :

Its been a long standing problem that DataNode takes too many threads for data 
transfers. Each write operation takes up 2 threads at each of the datanodes and 
each read operation takes one irrespective of how much activity is on the 
sockets. The kinds of load that HDFS serves has been expanding quite fast and 
HDFS should handle these varied loads better. If there is a framework for 
non-blocking IO, read and write pipeline state machines could be implemented 
with async events on a fixed number of threads. 

A generic utility is better since it could be used in other places like 
DFSClient. DFSClient currently creates 2 extra threads for each file it has 
open for writing.

Initially I started writing a primitive "selector", then tried to see if such 
facility already exists. [Apache MINA|http://mina.apache.org] seemed to do 
exactly this. My impression after looking the the interface and examples is 
that it does not give kind control we might prefer or need.  First use case I 
was thinking of implementing using MINA was to replace "response handlers" in 
DataNode. The response handlers are simpler since they don't involve disk I/O. 
I [asked on MINA user 
list|http://www.nabble.com/Async-events-with-existing-NIO-sockets.-td18640767.html],
 but looks like it can not be done, I think mainly because the sockets are 
already created.

Essentially what I have in mind is similar to MINA, except that read and write 
of the sockets is done by the event handlers. The lowest layer essentially 
invokes selectors, invokes event handlers on single or on multiple threads. 
Each event handler is is expected to do some non-blocking work. We would of 
course have utility handler implementations that do  read, write, accept etc, 
that are useful for simple processing.

Sam Pullara mentioned that [xSockets|http://xsocket.sourceforge.net/] is more 
flexible. It is under GPL.

Are there other such implementations we should look at?


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to