On Fri, May 4, 2018 at 5:47 AM, Anu Engineer <aengin...@hortonworks.com> wrote:
> Hi Stack, > > > > Why don’t we look at the design of what is being proposed? Let us post > the design to HDFS-9924 and then if needed, by all means let us open a new > Jira. > > That will make it easy to understand the context if someone is looking at > HDFS-9924. > > > I posted a WIP design-for-discussion up on a new issue, HDFS-13572, after spending a bunch of time in HDFS-9924 and HADOOP-12910 (Duo had posted an earlier version on HDFS-9924 a while back). HDFS-9924 is stalled. It is filled with "discussion" that seems mostly to be behind where we'd like to take-off (i.e. whether hadoop2 or hadoop3 first, what is an async api, what is async programming, etc.). We hope to 'vault' HDFS-9924 by skipping to an hadoop3/jdk8/CompletableFuture basis and by taking on contributor requests in HDFS-9924 -- e.g. a design first, dev in a feature branch, and so on -- EXCEPTing the hadoop2 targeting. Hence the new issue for a new undertaking (and to save folks having to wade through reams to get to the new effort). > I personally believe that it should be the developers of the feature that > should decide what goes in, what to call the branch etc. But It would be > nice to have > > some sort of continuity of HDFS-9924. > > > Agree with the above. I'll take care of tying HDFS-9924 over to the new issue. Thanks, St.Ack > Thanks > > Anu > > > > *From: *<saint....@gmail.com> on behalf of Stack <st...@duboce.net> > *Date: *Thursday, May 3, 2018 at 9:04 PM > *To: *Anu Engineer <aengin...@hortonworks.com> > *Cc: *Wei-Chiu Chuang <weic...@apache.org>, "hdfs-dev@hadoop.apache.org" < > hdfs-dev@hadoop.apache.org> > *Subject: *Re: [DISCUSSION] Create a branch to work on non-blocking > access to HDFS > > > > Thanks for support Wei-Chiu and Anu. > > > > Thinking more on it, we should just open a new JIRA. HDFS-9924 is an old > branch with commits we don't need full of commentary that is, ahem, a mite > off-topic. Duo can attach his design to the new issue. We can cite > HDFS-9924 as provenance and aggregate the discussion as launching pad for > the new effort in new issue. > > > > Hopefully this is agreeable, > > Thanks, > > > > S > > > > On Thu, May 3, 2018 at 1:54 PM, Anu Engineer <aengin...@hortonworks.com> > wrote: > > Hi St.ack/Wei-Chiu, > > It is very kind of St.Ack to bring this question to HDFS Dev. I think this > is a good feature to have. As for the branch question, > HDFS-9924 branch is already open, we could just use that and I am +1 on > adding Duo as a branch committer. > > I am not familiar with HBase code base, I am presuming that there will be > some deviation from the current design > doc posted in HDFS-9924. Would it be make sense to post a new design > proposal on HDFS-9924? > > --Anu > > > > > On 5/3/18, 9:29 AM, "Wei-Chiu Chuang" <weic...@apache.org> wrote: > > Given that HBase 2 uses async output by default, the way that code is > maintained today in HBase is not sustainable. That piece of code > should be > maintained in HDFS. I am +1 as a participant in both communities. > > On Thu, May 3, 2018 at 9:14 AM, Stack <st...@duboce.net> wrote: > > > Ok with you lot if a few of us open a branch to work on a > non-blocking HDFS > > client? > > > > Intent is to finish up the old issue "HDFS-9924 [umbrella] > Nonblocking HDFS > > Access". On the foot of this umbrella JIRA is a proposal by the > > heavy-lifter, Duo Zhang. Over in HBase, we have a limited async DFS > client > > (written by Duo) that we use making Write-Ahead Logs. We call it > > AsyncFSWAL. It was shipped as the default WAL writer in hbase-2.0.0. > > > > Let me quote Duo from his proposal at the base of HDFS-9924: > > > > ....We use lots of internal APIs of HDFS to implement the > AsyncFSWAL, so it > > is expected that things like HBASE-20244 > > <https://issues.apache.org/jira/browse/HBASE-20244> > > ["NoSuchMethodException > > when retrieving private method decryptEncryptedDataEncryptionKey > from > > DFSClient"] will happen again and again. > > > > To make life easier, we need to move the async output related code > into > > HDFS. The POC [attached as patch on HDFS-9924] shows that option 3 > [1] can > > work, so I would like to create a feature branch to implement the > async dfs > > client. In general I think there are 4 steps: > > > > 1. Implement an async rpc client with option 3 [1] described above. > > 2. Implement the filesystem APIs which only need to connect to NN, > such as > > 'mkdirs'. > > 3. Implement async file read. The problem is the API. For pread I > think a > > CompletableFuture is enough, the problem is for the streaming read. > Need to > > discuss later. > > 4. Implement async file write. The API will also be a problem, but a > more > > important problem is that, if we want to support fan-out, the > current logic > > at DN side will make the semantic broken as we can read uncommitted > data > > very easily. In HBase it is solved by HBASE-14004 > > <https://issues.apache.org/jira/browse/HBASE-14004> but I do not > think we > > should keep the broken behavior in HDFS. We need to find a way to > deal with > > it. > > > > Comments welcome. > > > > Intent is to make a branch named HDFS-9924 (or should we just do a > new > > JIRA?) and to add Duo as a feature branch committer. If all goes > well, > > we'll call for a merge VOTE. > > > > Thanks, > > St.Ack > > > > 1.Option 3: "Use the old protobuf rpc interface and implement a new > rpc > > framework. The benefit is that we also do not need port unification > service > > at server side and do not need to maintain two implementations at > server > > side. And one more thing is that we do not need to upgrade protobuf > to > > 3.x." > > > > > > -- > A very happy Hadoop contributor > > >