[ 
https://issues.apache.org/jira/browse/HADOOP-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638346#action_12638346
 ] 

Doug Cutting commented on HADOOP-3856:
--------------------------------------

I think before we can commit to an async strategy for HDFS we'll need to 
perform some experiments.  And before we commit async RPCs to trunk they ought 
to be used, to test their adequacy.  So, it could be done as two separate 
issues, but they're best developed in parallel.  For example, there may be no 
point in committing async RPC extensions if buffer-by-buffer access proves 
impractical in HDFS.  We don't need features that are not used.  So I think 
this could be done in a single Jira, or two that are closely coordinated.

> async RPC calls where the caller does not need to wait for a response

Yes, I agree.  My suggestion above was that we might model this in an interface 
by declaring methods with a particular return type.  On further thought, that 
wouldn't work, since that method could not be implemented server-side.  But it 
would sure be nice if one didn't have to use meta-programming (e.g., 
RPC.asyncCall(Class.getMethod(...))) but could instead directly invoke async 
methods.  So perhaps one could declare pairs of methods, like:

{code}
interface FooProtocol {
  Foo getFoo();
  SelectionKey getFooAsync();
{code}

The RPC runtime would match methods whose name ends with "Async" and whose 
return type is SelectionKey with a method of the same name w/o "Async" and a 
different return type.  The client could call getFooAsync() to get a 
SelectionKey, the server would call getFoo(), and the client would cast the 
result from the server to a Bar.  The ugly part is that the implementation on 
the server would have to provide some definition of fooAsync() in order to 
compile, but it would never actually be called.  Perhaps to avoid this we could 
add a client-specific interface:

{code}
interface FooProtocol {
  Foo getFoo();
}
interface FooClient extends FooProtocol {
  SelectionKey getFooAsync();
}
{code}
Then pass both classes to RPC#getProxy().  It would return a FooClient, but use 
FooProtocol to talk to the server.  The server would only implement 
FooProtocol.  Could that work?


> Asynchronous IO Handling in Hadoop and HDFS
> -------------------------------------------
>
>                 Key: HADOOP-3856
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3856
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs, io
>            Reporter: Raghu Angadi
>         Attachments: GrizzlyEchoServer.patch, MinaEchoServer.patch
>
>
> I think Hadoop needs utilities or framework to make it simpler to deal with 
> generic asynchronous IO in  Hadoop.
> Example use case :
> Its been a long standing problem that DataNode takes too many threads for 
> data transfers. Each write operation takes up 2 threads at each of the 
> datanodes and each read operation takes one irrespective of how much activity 
> is on the sockets. The kinds of load that HDFS serves has been expanding 
> quite fast and HDFS should handle these varied loads better. If there is a 
> framework for non-blocking IO, read and write pipeline state machines could 
> be implemented with async events on a fixed number of threads. 
> A generic utility is better since it could be used in other places like 
> DFSClient. DFSClient currently creates 2 extra threads for each file it has 
> open for writing.
> Initially I started writing a primitive "selector", then tried to see if such 
> facility already exists. [Apache MINA|http://mina.apache.org] seemed to do 
> exactly this. My impression after looking the the interface and examples is 
> that it does not give kind control we might prefer or need.  First use case I 
> was thinking of implementing using MINA was to replace "response handlers" in 
> DataNode. The response handlers are simpler since they don't involve disk 
> I/O. I [asked on MINA user 
> list|http://www.nabble.com/Async-events-with-existing-NIO-sockets.-td18640767.html],
>  but looks like it can not be done, I think mainly because the sockets are 
> already created.
> Essentially what I have in mind is similar to MINA, except that read and 
> write of the sockets is done by the event handlers. The lowest layer 
> essentially invokes selectors, invokes event handlers on single or on 
> multiple threads. Each event handler is is expected to do some non-blocking 
> work. We would of course have utility handler implementations that do  read, 
> write, accept etc, that are useful for simple processing.
> Sam Pullara mentioned that [xSockets|http://xsocket.sourceforge.net/] is more 
> flexible. It is under GPL.
> Are there other such implementations we should look at?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to