[jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access

stack (JIRA) Wed, 15 Jun 2016 14:58:39 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332670#comment-15332670
 ]


stack commented on HDFS-9924:
-----------------------------

bq. Quoting Tsz Wo Nicholas Sze words, I understand your concern but it is a 
different problem. We should not protect NN by making the client slow. We 
should add protection in NN instead

The above quote is magical-thinking (see the response to the above quote given 
by Daryn, an operator of one of our largest deploys). We are talking branch-2 
here for this Future hack. The NN is not going to sprout scale of a sudden in 
the branch-2 line to support 'thousands' of concurrent ops coming in from an 
adjacent, Hive metadata server blame-shifting. Some form of parsimony, concern 
for NN loading, is in order.

Rereading this issue from the top down (including the design doc -- it needs 
numbers... what is a large number of calls?; why wouldn't a thread pool work 
given you need to throttle) and seeing where we have arrived, this issue is not 
about 'Asynchronous HDFS Access' as the summary and original description 
advertises but instead is an expedient hack-for-hive, for late in branch-2 
only. The 'change' will have a short shelf-life it seems given it arrives in 
2.9.0+ (?) and branch-3 is looking to be a different API (See discussion on 
HADOOP-12910).  The two distinct positions I discern in the discussion so far 
-- those who want a true async API on HDFS and those working on a hive fix -- 
are having trouble finding a common ground. If this characterization is 
correct, I'd suggest lets just call this issue a hack-for-hive explicitly and 
annotate it as such. A good few of the participants in this issue are likely 
not much interested in the latter (e.g. myself) as long as this work does not 
get in the way of our having a 'real' async API (HADOOP-12910) or confuse 
downstreamers on what the async story on HDFS is.








> [umbrella] Asynchronous HDFS Access
> -----------------------------------
>
>                 Key: HDFS-9924
>                 URL: https://issues.apache.org/jira/browse/HDFS-9924
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: fs
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: Xiaobing Zhou
>         Attachments: AsyncHdfs20160510.pdf
>
>
> This is an umbrella JIRA for supporting Asynchronous HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked 
> until the method returns.  It is very slow if a client makes a large number 
> of independent calls in a single thread since each call has to wait until the 
> previous call is finished.  It is inefficient if a client needs to create a 
> large number of threads to invoke the calls.
> We propose adding a new API to support asynchronous calls, i.e. the caller is 
> not blocked.  The methods in the new API immediately return a Java Future 
> object.  The return value can be obtained by the usual Future.get() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-9924) [umbrella] Asynchronous HDFS Access

Reply via email to