[
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15332670#comment-15332670
]
stack commented on HDFS-9924:
-----------------------------
bq. Quoting Tsz Wo Nicholas Sze words, I understand your concern but it is a
different problem. We should not protect NN by making the client slow. We
should add protection in NN instead
The above quote is magical-thinking (see the response to the above quote given
by Daryn, an operator of one of our largest deploys). We are talking branch-2
here for this Future hack. The NN is not going to sprout scale of a sudden in
the branch-2 line to support 'thousands' of concurrent ops coming in from an
adjacent, Hive metadata server blame-shifting. Some form of parsimony, concern
for NN loading, is in order.
Rereading this issue from the top down (including the design doc -- it needs
numbers... what is a large number of calls?; why wouldn't a thread pool work
given you need to throttle) and seeing where we have arrived, this issue is not
about 'Asynchronous HDFS Access' as the summary and original description
advertises but instead is an expedient hack-for-hive, for late in branch-2
only. The 'change' will have a short shelf-life it seems given it arrives in
2.9.0+ (?) and branch-3 is looking to be a different API (See discussion on
HADOOP-12910). The two distinct positions I discern in the discussion so far
-- those who want a true async API on HDFS and those working on a hive fix --
are having trouble finding a common ground. If this characterization is
correct, I'd suggest lets just call this issue a hack-for-hive explicitly and
annotate it as such. A good few of the participants in this issue are likely
not much interested in the latter (e.g. myself) as long as this work does not
get in the way of our having a 'real' async API (HADOOP-12910) or confuse
downstreamers on what the async story on HDFS is.
> [umbrella] Asynchronous HDFS Access
> -----------------------------------
>
> Key: HDFS-9924
> URL: https://issues.apache.org/jira/browse/HDFS-9924
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: fs
> Reporter: Tsz Wo Nicholas Sze
> Assignee: Xiaobing Zhou
> Attachments: AsyncHdfs20160510.pdf
>
>
> This is an umbrella JIRA for supporting Asynchronous HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked
> until the method returns. It is very slow if a client makes a large number
> of independent calls in a single thread since each call has to wait until the
> previous call is finished. It is inefficient if a client needs to create a
> large number of threads to invoke the calls.
> We propose adding a new API to support asynchronous calls, i.e. the caller is
> not blocked. The methods in the new API immediately return a Java Future
> object. The return value can be obtained by the usual Future.get() method.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]