[ 
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15190112#comment-15190112
 ] 

Tsz Wo Nicholas Sze commented on HDFS-9924:
-------------------------------------------

> Can you quantify what the performance improvements will be here for Hive? 
> What is the performance delta of an async API versus just making vanilla 
> synchronous HDFS calls from a thread pool?

Comparing with a single thread, the performance improvement gains is obvious.  
I expect operations taking a few hours in single thread can be improved to a 
few minutes using asynchronous calls.  Of course, I don't yet have asynchronous 
calls to test.

The main problem of vanilla synchronous HDFS calls from a thread pool is not 
performance -- it is the thread creations.  As mentioned in HADOOP-12909, the 
underlying RPC mechanism is already supporting asynchronous calls.   Currently, 
synchronous call is implemented by invoking wait() in the caller thread in 
order to wait for the server response.  Now, if we use threads to do 
asynchronous calls, each thread will just be blocked by wait().  This probably 
is going to be taught in universities as a don't-create-threads-to-wait anti 
pattern, I guess.  :)

> [umbrella] Asynchronous HDFS Access
> -----------------------------------
>
>                 Key: HDFS-9924
>                 URL: https://issues.apache.org/jira/browse/HDFS-9924
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: fs
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: Xiaobing Zhou
>
> This is an umbrella JIRA for supporting Asynchronous HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked 
> until the method returns.  It is very slow if a client makes a large number 
> of independent calls in a single thread since each call has to wait until the 
> previous call is finished.  It is inefficient if a client needs to create a 
> large number of threads to invoke the calls.
> We propose adding a new API to support asynchronous calls, i.e. the caller is 
> not blocked.  The methods in the new API immediately return a Java Future 
> object.  The return value can be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to