Re: [DISCUSSION] Create a branch to work on non-blocking access to HDFS

Stack Tue, 15 May 2018 21:31:20 -0700

On Fri, May 4, 2018 at 5:47 AM, Anu Engineer <[email protected]>
wrote:


> Hi Stack,
>
>
>
> Why don’t we look at the design of what is being proposed?  Let us post
> the design to HDFS-9924 and then if needed, by all means let us open a new
> Jira.
>
> That will make it easy to understand the context if someone is looking at
> HDFS-9924.
>
>
>

I posted a WIP design-for-discussion up on a new issue, HDFS-13572, after
spending a bunch of time in HDFS-9924 and HADOOP-12910 (Duo had posted an
earlier version on HDFS-9924 a while back).

HDFS-9924 is stalled. It is filled with "discussion" that seems mostly to
be behind where we'd like to take-off (i.e. whether hadoop2 or hadoop3
first, what is an async api, what is async programming, etc.). We hope to
'vault' HDFS-9924 by skipping to an hadoop3/jdk8/CompletableFuture basis
and by taking on contributor requests in HDFS-9924 -- e.g. a design first,
dev in a feature branch, and so on -- EXCEPTing the hadoop2 targeting.

Hence the new issue for a new undertaking (and to save folks having to wade
through reams to get to the new effort).



> I personally believe that it should be the developers of the feature that
> should decide what goes in, what to call the branch etc. But It would be
> nice to have
>
> some sort of continuity of HDFS-9924.
>
>
>

Agree with the above. I'll take care of tying HDFS-9924 over to the new
issue.

Thanks,
St.Ack



> Thanks
>
> Anu
>
>
>
> *From: *<[email protected]> on behalf of Stack <[email protected]>
> *Date: *Thursday, May 3, 2018 at 9:04 PM
> *To: *Anu Engineer <[email protected]>
> *Cc: *Wei-Chiu Chuang <[email protected]>, "[email protected]" <
> [email protected]>
> *Subject: *Re: [DISCUSSION] Create a branch to work on non-blocking
> access to HDFS
>
>
>
> Thanks for support Wei-Chiu and Anu.
>
>
>
> Thinking more on it, we should just open a new JIRA. HDFS-9924 is an old
> branch with commits we don't need full of commentary that is, ahem, a mite
> off-topic.  Duo can attach his design to the new issue. We can cite
> HDFS-9924 as provenance and aggregate the discussion as launching pad for
> the new effort in new issue.
>
>
>
> Hopefully this is agreeable,
>
> Thanks,
>
>
>
> S
>
>
>
> On Thu, May 3, 2018 at 1:54 PM, Anu Engineer <[email protected]>
> wrote:
>
> Hi St.ack/Wei-Chiu,
>
> It is very kind of St.Ack to bring this question to HDFS Dev. I think this
> is a good feature to have. As for the branch question,
> HDFS-9924 branch is already open, we could just use that and I am +1 on
> adding Duo as a branch committer.
>
> I am not familiar with HBase code base, I am presuming that there will be
> some deviation from the current design
> doc posted in HDFS-9924. Would it be make sense to post a new design
> proposal on HDFS-9924?
>
> --Anu
>
>
>
>
> On 5/3/18, 9:29 AM, "Wei-Chiu Chuang" <[email protected]> wrote:
>
>     Given that HBase 2 uses async output by default, the way that code is
>     maintained today in HBase is not sustainable. That piece of code
> should be
>     maintained in HDFS. I am +1 as a participant in both communities.
>
>     On Thu, May 3, 2018 at 9:14 AM, Stack <[email protected]> wrote:
>
>     > Ok with you lot if a few of us open a branch to work on a
> non-blocking HDFS
>     > client?
>     >
>     > Intent is to finish up the old issue "HDFS-9924 [umbrella]
> Nonblocking HDFS
>     > Access". On the foot of this umbrella JIRA is a proposal by the
>     > heavy-lifter, Duo Zhang. Over in HBase, we have a limited async DFS
> client
>     > (written by Duo) that we use making Write-Ahead Logs. We call it
>     > AsyncFSWAL. It was shipped as the default WAL writer in hbase-2.0.0.
>     >
>     > Let me quote Duo from his proposal at the base of HDFS-9924:
>     >
>     > ....We use lots of internal APIs of HDFS to implement the
> AsyncFSWAL, so it
>     > is expected that things like HBASE-20244
>     > <https://issues.apache.org/jira/browse/HBASE-20244>
>     > ["NoSuchMethodException
>     > when retrieving private method decryptEncryptedDataEncryptionKey
> from
>     > DFSClient"] will happen again and again.
>     >
>     > To make life easier, we need to move the async output related code
> into
>     > HDFS. The POC [attached as patch on HDFS-9924] shows that option 3
> [1] can
>     > work, so I would like to create a feature branch to implement the
> async dfs
>     > client. In general I think there are 4 steps:
>     >
>     > 1. Implement an async rpc client with option 3 [1] described above.
>     > 2. Implement the filesystem APIs which only need to connect to NN,
> such as
>     > 'mkdirs'.
>     > 3. Implement async file read. The problem is the API. For pread I
> think a
>     > CompletableFuture is enough, the problem is for the streaming read.
> Need to
>     > discuss later.
>     > 4. Implement async file write. The API will also be a problem, but a
> more
>     > important problem is that, if we want to support fan-out, the
> current logic
>     > at DN side will make the semantic broken as we can read uncommitted
> data
>     > very easily. In HBase it is solved by HBASE-14004
>     > <https://issues.apache.org/jira/browse/HBASE-14004> but I do not
> think we
>     > should keep the broken behavior in HDFS. We need to find a way to
> deal with
>     > it.
>     >
>     > Comments welcome.
>     >
>     > Intent is to make a branch named HDFS-9924 (or should we just do a
> new
>     > JIRA?) and to add Duo as a feature branch committer. If all goes
> well,
>     > we'll call for a merge VOTE.
>     >
>     > Thanks,
>     > St.Ack
>     >
>     > 1.Option 3:  "Use the old protobuf rpc interface and implement a new
> rpc
>     > framework. The benefit is that we also do not need port unification
> service
>     > at server side and do not need to maintain two implementations at
> server
>     > side. And one more thing is that we do not need to upgrade protobuf
> to
>     > 3.x."
>     >
>
>
>
>     --
>     A very happy Hadoop contributor
>
>
>

Re: [DISCUSSION] Create a branch to work on non-blocking access to HDFS

Reply via email to