[ https://issues.apache.org/jira/browse/HDFS-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14329333#comment-14329333 ]
Colin Patrick McCabe commented on HDFS-6994: -------------------------------------------- bq. [~wheat9] wrote: I'm concerned about this. What are the guarantees of the APIs for the releases? Are the APIs / ABIs going to be compatible once we remove exceptions in later versions? Can the user simply do a drop-in replacement to upgrade? For the part of libhdfs binding the answer might be yes, but my general impression is no due to the complexity of SEH on Windows and various quirks on the implementation of the C++ exceptions. The current plan is to expose only the existing {{libhdfs.h}} API for now. Since this is a C API, it does not include exceptions, clearly. So I do not think there will be a problem with this. What we are discussing is eliminating the use of exceptions internally. Since this happens at a level which is not visible to users, it can certainly be done later if we want to. However, I would like to see it fixed sooner rather than later. it is important to stick to a consistent coding style, and we want to work on the robustness. I have proposed a C\+\+ API for libhdfs and libhdfs3 at https://issues.apache.org/jira/browse/HDFS-7207. I would welcome more discussion there. Note that my API does not use exceptions and does not require C\+\+11 (although it can make use of C\+\+ features if it is available.) bq. [asynchronous api discussion] If you look at a high-performance HDFS client like Impala or HAWQ, they are fine with synchronous APIs. Why? Well, most of the time your read performance is limited by the bandwidth of the local disks (high performance clients always try to do local reads, and use short-circuit and mmap if possible). A local hard disk can't handle more than maybe 100 seeks a second, and the more seeks you do, the lower your bandwidth will be. There is also the CPU aspect: what are you doing with the data? Sure you can have 10,000 async requests going with 1 thread, but if that thread is actually doing anything with the data, you can cut a few zeroes off of that. And then you're back to an amount of concurrent reads that can be comfortably done synchronously. Async APIs work best for cases where you are doing very, very little processing on each request. So an async web server like ngnix, which is written in highly optimized straight C (no \+\+) can squeeze a few more pages per second out of reducing its thread count. But in a DB it's tougher (and as you mentioned, it also makes the code much more complex). So while we should probably consider an async client at some point, I think it is much lower priority than other things (like finishing the existing native client and merging it) > libhdfs3 - A native C/C++ HDFS client > ------------------------------------- > > Key: HDFS-6994 > URL: https://issues.apache.org/jira/browse/HDFS-6994 > Project: Hadoop HDFS > Issue Type: New Feature > Components: hdfs-client > Reporter: Zhanwei Wang > Assignee: Zhanwei Wang > Attachments: HDFS-6994-rpc-8.patch, HDFS-6994.patch > > > Hi All > I just got the permission to open source libhdfs3, which is a native C/C++ > HDFS client based on Hadoop RPC protocol and HDFS Data Transfer Protocol. > libhdfs3 provide the libhdfs style C interface and a C++ interface. Support > both HADOOP RPC version 8 and 9. Support Namenode HA and Kerberos > authentication. > libhdfs3 is currently used by HAWQ of Pivotal > I'd like to integrate libhdfs3 into HDFS source code to benefit others. > You can find libhdfs3 code from github > https://github.com/PivotalRD/libhdfs3 > http://pivotalrd.github.io/libhdfs3/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)