[
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15300256#comment-15300256
]
Daryn Sharp commented on HDFS-9924:
-----------------------------------
I'm late to the game due to time constraints, but this feature greatly concerns
me.
It's true the NN can handle over 100k ops/sec but only with a read-dominated
workload. Even then, I've had to do _a lot_ of internal (hopefully soon to be
published) performance work to prevent blowing the heap under such a sustained
load - recent user pushed a NN to 90k ops/sec for most of weekend and barely
dented the heap. BUT it was 81% read ops. In the past that would have been a
8-10 min GC. I digress.
More on point: The intended use case is for mass write operations. Consider
this: on multiple large clusters, offloading just a few thousands write ops/sec
for log aggregation reduced 95th ptile processing time from 4ms to <.5ms and
queue time from 20ms to 4ms. The extremely wild variance in the metrics also
stabilized.
I've already been having performance concerns with hive's mass
setOwner/setPermission which I believe is single-threaded. This feature
appears intended for hive. I'm really hesitant for a feature that makes it
trivial to destroy a NN.
> [umbrella] Asynchronous HDFS Access
> -----------------------------------
>
> Key: HDFS-9924
> URL: https://issues.apache.org/jira/browse/HDFS-9924
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: fs
> Reporter: Tsz Wo Nicholas Sze
> Assignee: Xiaobing Zhou
> Attachments: AsyncHdfs20160510.pdf
>
>
> This is an umbrella JIRA for supporting Asynchronous HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked
> until the method returns. It is very slow if a client makes a large number
> of independent calls in a single thread since each call has to wait until the
> previous call is finished. It is inefficient if a client needs to create a
> large number of threads to invoke the calls.
> We propose adding a new API to support asynchronous calls, i.e. the caller is
> not blocked. The methods in the new API immediately return a Java Future
> object. The return value can be obtained by the usual Future.get() method.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]