+1 (binding) -C

On Thu, Mar 8, 2018 at 9:31 AM, Jim Clampffer <james.clampf...@gmail.com> wrote:
> Hi Everyone,
> The feedback was generally positive on the discussion thread [1] so I'd
> like to start a formal vote for merging HDFS-8707 (libhdfs++) into trunk.
> The vote will be open for 7 days and end 6PM EST on 3/15/18.
> This branch includes a C++ implementation of an HDFS client for use in
> applications that don't run an in-process JVM.  Right now the branch only
> supports reads and metadata calls.
> Features (paraphrasing the list from the discussion thread):
> -Avoiding the JVM means applications that use libhdfs++ can explicitly
> control resources (memory, FDs, threads).  The driving goal for this
> project was to let C/C++ applications access HDFS while maintaining a
> single heap.
> -Includes support for Kerberos authentication.
> -Includes a libhdfs/libhdfs3 compatible C API as well as a C++ API that
> supports asynchronous operations.  Applications that only do reads may be
> able to use this as a drop in replacement for libhdfs.
> -Asynchronous IO is built on top of boost::asio which in turn uses
> select/epoll so many sockets can be monitored from a single thread (or
> thread pool) rather than spawning a thread to sleep on a blocked socket.
> -Includes a set of utilities written in C++ that mirror the CLI tools (e.g.
> ./hdfs dfs -ls).  These have a 3 order of magnitude lower startup time than
> java client which is useful for scripts that need to work with many files.
> -Support for cancelable reads that release associated resources
> immediately.  Useful for applications that need to be responsive to
> interactive users.
> Other points:
> -This is almost all new code in a new subdirectory.  No Java source for the
> rest of hadoop was changed so there's no risk of regressions there.  The
> only changes outside of that subdirectory were integrating the build in
> some of the pom files and adding a couple dependencies to the DockerFile.
> -The library has had plenty of burn-in time.  It's been used in production
> for well over a year and is indirectly being distributed as part of the
> Apache ORC project (in the form of a third party dependency).
> -There isn't much in the way of well formatted documentation right now.
> The documentation for the libhdfs API is applicable to the libhdfs++ C API.
> Header files describe various component including details about threading
> and lifecycle expectations for important objects.  Good places to start are
> hdfspp.h, filesystem.h, filehandle.h, rpc_connection.h and rpc_enginel.h.
> I'll start with my +1 (binding).
> [1]
> http://mail-archives.apache.org/mod_mbox/hadoop-hdfs-dev/201803.mbox/browser
> (second message in thread, can't figure out how to link directly to mine)
> Thanks!

To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to