[ 
https://issues.apache.org/jira/browse/HDFS-7207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14164440#comment-14164440
 ] 

Colin Patrick McCabe commented on HDFS-7207:
--------------------------------------------

I thought about this a little bit more.  I thought it would be nice to have a 
C++ API for libhdfs, but to be usable, it had to meet a few conditions:

* Not use exceptions (as [~wheat9] mentioned, many C++ programs don't use 
exceptions... including the llvm compiler, the Chrome web browser, pretty much 
any code Google develops).

* Be usable for both {{libhdfs}}, {{libwebhdfs}}, and {{libhdfs3}}.  Since 
{{libhdfs3}} is hdfs-specific, it cannot replace the functionality of the 
original {{libhdfs}}.  Many users are using {{libhdfs}} to talk to S3.  
{{libhdfs3}} also lacks support for things like encryption (although unlike S3 
support, it will eventually get it).  So this flexibility is essential.

* Work with both C\+\+11 and earlier C\+\+ revisions.

* Not force {{libhdfs3}} to expose its guts to the world.  We want to be able 
to change {{libhdfs3}} in the future without breaking applications.  So 
exposing internal header files and data structures is out of the question.  We 
need a stable ABI.

What I came up with is {{libhdfs.hpp}}.  It is implemented entirely in a header 
file.  Because it calls functions from {{libhdfs.h}}, it will work with both 
{{libhdfs}} and {{libwebhdfs}}, as well as {{libhdfs3}}.  We can change the 
interface without worrying about breaking the ABI, since when the header file 
is compiled, the code becomes a part of the client.  The only functions that 
the libraries needs to export are the original {{libhdfs}} functions.

We never pass C\+\+ types across the application / shared library boundary, so 
we don't have to worry about issues like the application using a different name 
mangling scheme or libstdc++ than the library.  The only types passed across 
the application / shared library boundary are C types, and we've done a very 
good job in the past in not breaking that ABI.  In practice, this means that 
users of {{hdfs.hpp}} will be able to upgrade libhdfs3 without recompiling.

{{libhdfs.hpp}} gets rid of the need to check {{errno}} by introducing a 
{{Status}} class that is returned by functions that can fail.  (This is similar 
to what Haohui suggested above.)  You can call {{Status#toError}} to get a nice 
error message.

{{libhdfs.hpp}} uses {{shared_ptr}} to allow the input and output stream to 
keep a reference to the filesystem object.  This prevents the filesystem object 
from going away as long as at least one stream referencing it is open.  Using 
{{shared_ptr}} helps to avoid the possibility of resource leaks that 
programmers needed to manually avoid in the C version.

I implemented *all* current functionality of libhdfs except short-circuit 
reads, which can be implemented later pretty easily.

> libhdfs3 should not expose exceptions in public C++ API
> -------------------------------------------------------
>
>                 Key: HDFS-7207
>                 URL: https://issues.apache.org/jira/browse/HDFS-7207
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Haohui Mai
>            Assignee: Colin Patrick McCabe
>            Priority: Blocker
>         Attachments: HDFS-7207.001.patch
>
>
> There are three major disadvantages of exposing exceptions in the public API:
> * Exposing exceptions in public APIs forces the downstream users to be 
> compiled with {{-fexceptions}}, which might be infeasible in many use cases.
> * It forces other bindings to properly handle all C++ exceptions, which might 
> be infeasible especially when the binding is generated by tools like SWIG.
> * It forces the downstream users to properly handle all C++ exceptions, which 
> can be cumbersome as in certain cases it will lead to undefined behavior 
> (e.g., throwing an exception in a destructor is undefined.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to