[ 
https://issues.apache.org/jira/browse/HDFS-7207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169882#comment-14169882
 ] 

Colin Patrick McCabe commented on HDFS-7207:
--------------------------------------------

bq. A slightly simpler (probably subjective) approach might be to wrap things 
in the opposite way. That is, putting the error message / stack traces in the 
Status object directly and let hdfsGetLastError to get the string. It avoids 
copying the error message twice, once from the implementation to the TLS and 
another from TLS to the returned Status object. What do you think?

I think we should definitely add {{hdfsGetLastError}} to the C API.  There are 
a lot of applications using the C API and a lot of them are going to continue 
to do so for all the reasons we discussed earlier.  This is an easy, 100% 
backwards-compatible way to add richer error messages to the API.

I don't think copying an error message once is worth thinking about.  Errors 
are (or should be) rare.  The overhead of throwing an exception is much larger 
than copying a C string, and libhdfs3 currently throws exceptions in error 
cases.  So this is strictly an improvement from a performance point of view.

It also simplifies maintenance because we only have to worry about setting 
error messages at one point.  And it's the only way we can add richer error 
messages in {{libhdfs}} and {{libwebhdfs}}.  No other solution even comes close 
to matching the advantages of {{hdfsGetLastError}}, in my opinion.

bq. If an Input / OutputStream is leaked than the corresponding FileSystem will 
leak. I found the paradigm in leveldb quite helpful:
{code}
DB *db = DB::Open();
Iterator *it = db->(...);
delete db; // bails out because the iterator it has leaked.
{code}
bq. That might allow the user to be more aware of the leaks. Maybe we can do 
something similar?

I might be missing something, but giving the user back a bare pointer seems 
strictly less useful than giving the user back a shared_ptr.  As [~wangzw] 
pointed out, we still have to do refcounting either way, so there's no 
performance improvement.  If the user wants shared_ptr semantics and you give 
them a bare pointer, they have to wrap it in another shared_ptr, adding 
overhead.  On the other hand, if the user doesn't want shared_ptr semantics, 
there is no disadvantage to giving back a shared_ptr.  The user can simply 
delete the streams, then delete the filesystem, and get the same result as with 
a bare pointer.

If leaks are a problem in a C++ program, there are tools like valgrind, ASAN, 
and so forth.  We use these tools a lot in Impala-- they work really well!  
Throwing an exception in a delete() method is not really a very robust way of 
detecting memory leaks.  After all, the delete method itself may never be 
called if the programmer makes a mistake.

Finally, to repeat my earlier argument, bare pointers are basically what the C 
interface gives back.  That is the C way-- manual allocation and de-allocation. 
 So if we're going to do the same here, it makes me wonder why we need a new 
interface.  I guess you could argue that it allows us to detect use-after-free, 
but this is something that valgrind and ASAN do a great job detecting already, 
and without runtime overhead in production.

> libhdfs3 should not expose exceptions in public C++ API
> -------------------------------------------------------
>
>                 Key: HDFS-7207
>                 URL: https://issues.apache.org/jira/browse/HDFS-7207
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Haohui Mai
>            Assignee: Colin Patrick McCabe
>            Priority: Blocker
>         Attachments: HDFS-7207.001.patch
>
>
> There are three major disadvantages of exposing exceptions in the public API:
> * Exposing exceptions in public APIs forces the downstream users to be 
> compiled with {{-fexceptions}}, which might be infeasible in many use cases.
> * It forces other bindings to properly handle all C++ exceptions, which might 
> be infeasible especially when the binding is generated by tools like SWIG.
> * It forces the downstream users to properly handle all C++ exceptions, which 
> can be cumbersome as in certain cases it will lead to undefined behavior 
> (e.g., throwing an exception in a destructor is undefined.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to