On Thu, Mar 1, 2018 at 10:04 AM, Jim Clampffer <james.clampf...@gmail.com> wrote: > Chris, do you mean potentially landing this in its current state and > handling some of the rough edges after? I could see this working just > because there's no impact on any existing code.
Yes. Better to get this committed and released than to polish it in the branch. -C > With regards to your questions Kai: > There isn't a good doc for the internal architecture yet; I just reassigned > HDFS-9115 to myself to handle that. Are there any specific areas you'd like > to know about so I can prioritize those? > Here's some header files that include a lot of comments that should help out > for now: > -hdfspp.h - main header for the C++ API > -filesystem.h and filehandle.h - describes some rules about object lifetimes > and threading from the API point of view (most classes have comments > describing any restrictions on threading, locking, and lifecycle). > -rpc_engine.h and rpc_connection.h begin getting into the async RPC > implementation. > > > 1) Yes, it's a reimplementation of the entire client in C++. Using libhdfs3 > as a reference helps a lot here but it's still a lot of work. > 2) EC isn't supported now, though that'd be great to have, and I agree that > it's going to be take a lot of effort to implement. Right now if you tried > to read an EC file I think you'd get some unhelpful error out of the block > reader but I don't have an EC enabled cluster set up to test. Adding an > explicit not supported message would be straightforward. > 3) libhdfs++ reuses all of the minidfscluster tests that libhdfs already had > so we get consistency checks on the C API. There's a few new tests that > also get run on both libhdfs and libhdfs++ and make sure the expected output > is the same too. > 4) I agree, I just haven't had a chance to look into the distribution build > to see how to do it. HDFS-9465 is tracking this. > 5) Not yet (HDFS-8765). > > Regards, > James > > > > > On Thu, Mar 1, 2018 at 4:28 AM, 郑锴(铁杰) <zhengkai...@alibaba-inc.com> wrote: >> >> The work sounds solid and great! + to have this. >> >> Is there any quick doc to take a glance at? Some quick questions to be >> familiar with: >> 1. Seems the client is all implemented in c++ without any Java codes (so >> no JVM overhead), which means lots of work, rewriting HDFS client. Right? >> 2. Guess erasure coding feature isn't supported, as it'd involve >> significant development, right? If yes, what will it say when read erasure >> coded file? >> 3. Is there any building/testing mechanism to enforce the consistency >> between the c++ part and Java part? >> 4. I thought the public header and lib should be exported when building >> the distribution package, otherwise hard to use the new C api. >> 5. Is the short-circuit read supported? >> >> Thanks. >> >> >> Regards, >> Kai >> >> ------------------------------------------------------------------ >> 发件人:Chris Douglas <cdoug...@apache.org> >> 发送时间:2018年3月1日(星期四) 05:08 >> 收件人:Jim Clampffer <james.clampf...@gmail.com> >> 抄 送:Hdfs-dev <hdfs-dev@hadoop.apache.org> >> 主 题:Re: [DISCUSS] Merging HDFS-8707 (C++ HDFS client) to trunk >> >> +1 >> >> Let's get this done. We've had many false starts on a native HDFS >> client. This is a good base to build on. -C >> >> On Wed, Feb 28, 2018 at 9:55 AM, Jim Clampffer >> <james.clampf...@gmail.com> wrote: >> > Hi everyone, >> > >> > I'd like to start a thread to discuss merging the HDFS-8707 aka >> > libhdfs++ >> > into trunk. I sent originally sent a similar email out last October but >> > it >> > sounds like it was buried by discussions about other feature merges that >> > were going on at the time. >> > >> > libhdfs++ is an HDFS client written in C++ designed to be used in >> > applications that are written in non-JVM based languages. In its >> > current >> > state it supports kerberos authenticated reads from HDFS and has been >> > used >> > in production clusters for over a year so it has had a significant >> > amount >> > of burn-in time. The HDFS-8707 branch has been around for about 2 years >> > now so I'd like to know people's thoughts on what it would take to merge >> > current branch and handling writes and encrypted reads in a new one. >> > >> > Current notable features: >> > -A libhdfs/libhdfs3 compatible C API that allows libhdfs++ to serve as >> > a >> > drop-in replacement for clients that only need read support (until >> > libhdfs++ also supports writes). >> > -An asynchronous C++ API with synchronous shims on top if the client >> > application wants to do blocking operations. Internally a single thread >> > (optionally more) uses select/epoll by way of boost::asio to watch >> > thousands of sockets without the overhead of spawning threads to emulate >> > async operation. >> > -Kerberos/SASL authentication support >> > -HA namenode support >> > -A set of utility programs that mirror the HDFS CLI utilities e.g. >> > "./hdfs dfs -chmod". The major benefit of these is the tool startup >> > time >> > is ~3 orders of magnitude faster (<1ms vs hundreds of ms) and occupies a >> > lot less memory since it isn't dealing with the JVM. This makes it >> > possible to do things like write a simple bash script that stats a file, >> > applies some rules to the result, and decides if it should move it in a >> > way >> > that scales to thousands of files without being penalized with O(N) JVM >> > startups. >> > -Cancelable reads. This has proven to be very useful in multiuser >> > applications that (pre)fetch large blocks of data but need to remain >> > responsive for interactive users. Rather than waiting for a large >> > and/or >> > slow read to finish it will return immediately and the associated >> > resources >> > (buffer, file descriptor) become available for the rest of the >> > application >> > to use. >> > >> > There are a couple known issues: the doc build isn't integrated with the >> > rest of hadoop and the public API headers aren't being exported when >> > building a distribution. A short term solution for missing docs is to >> > go >> > through the libhdfs(3) compatible API and use the libhdfs docs. Other >> > than >> > a few modifications to the pom files to integrate the build and the >> > changes >> > are isolated to a new directory so the chance of causing any regressions >> > in >> > the rest of the code is minimal. >> > >> > Please share your thoughts, thanks! >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org >> For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org