[
https://issues.apache.org/jira/browse/HDFS-9924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16462238#comment-16462238
]
Duo Zhang commented on HDFS-9924:
---------------------------------
HBase 2.0.0 has been released, and the AsyncFSWAL(HBASE-14790) has been shipped
together with this release. We use lots of internal APIs of HDFS to implement
the AsyncFSWAL, so it is expected that things like HBASE-20244 will happen
again and again.
To make life easier, we need to move the async output related code into HDFS.
The POC above shows that option 3 can work, so I plan to create a feature
branch to implement the async dfs client. In general I think there are 4 steps:
1. Implement an async rpc client with option 3 describe above.
2. Implement the filesystem APIs which only need to connect to NN, such as
'mkdirs'.
3. Implement async file read. The problem is the API. For pread I think a
CompletableFuture is enough, the problem is for the streaming read. Need to
discuss later.
4. Implement async file write. The API will also be a problem, but a more
important problem is that, if we want to support fan-out, the current logic at
DN side will make the semantic broken as we can read uncommitted data very
easily. In HBase it is solved by HBASE-14004 but I do not think we should keep
the broken behavior in HDFS. We need to find a way to deal with it.
Thanks.
> [umbrella] Nonblocking HDFS Access
> ----------------------------------
>
> Key: HDFS-9924
> URL: https://issues.apache.org/jira/browse/HDFS-9924
> Project: Hadoop HDFS
> Issue Type: New Feature
> Components: fs
> Reporter: Tsz Wo Nicholas Sze
> Assignee: Duo Zhang
> Priority: Major
> Attachments: Async-HDFS-Performance-Report.pdf,
> AsyncHdfs20160510.pdf, HDFS-9924-POC.patch
>
>
> This is an umbrella JIRA for supporting Nonblocking HDFS Access.
> Currently, all the API methods are blocking calls -- the caller is blocked
> until the method returns. It is very slow if a client makes a large number
> of independent calls in a single thread since each call has to wait until the
> previous call is finished. It is inefficient if a client needs to create a
> large number of threads to invoke the calls.
> We propose adding a new API to support nonblocking calls, i.e. the caller is
> not blocked. The methods in the new API immediately return a Java Future
> object. The return value can be obtained by the usual Future.get() method.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]