[
https://issues.apache.org/jira/browse/HDFS-7207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169882#comment-14169882
]
Colin Patrick McCabe commented on HDFS-7207:
--------------------------------------------
bq. A slightly simpler (probably subjective) approach might be to wrap things
in the opposite way. That is, putting the error message / stack traces in the
Status object directly and let hdfsGetLastError to get the string. It avoids
copying the error message twice, once from the implementation to the TLS and
another from TLS to the returned Status object. What do you think?
I think we should definitely add {{hdfsGetLastError}} to the C API. There are
a lot of applications using the C API and a lot of them are going to continue
to do so for all the reasons we discussed earlier. This is an easy, 100%
backwards-compatible way to add richer error messages to the API.
I don't think copying an error message once is worth thinking about. Errors
are (or should be) rare. The overhead of throwing an exception is much larger
than copying a C string, and libhdfs3 currently throws exceptions in error
cases. So this is strictly an improvement from a performance point of view.
It also simplifies maintenance because we only have to worry about setting
error messages at one point. And it's the only way we can add richer error
messages in {{libhdfs}} and {{libwebhdfs}}. No other solution even comes close
to matching the advantages of {{hdfsGetLastError}}, in my opinion.
bq. If an Input / OutputStream is leaked than the corresponding FileSystem will
leak. I found the paradigm in leveldb quite helpful:
{code}
DB *db = DB::Open();
Iterator *it = db->(...);
delete db; // bails out because the iterator it has leaked.
{code}
bq. That might allow the user to be more aware of the leaks. Maybe we can do
something similar?
I might be missing something, but giving the user back a bare pointer seems
strictly less useful than giving the user back a shared_ptr. As [~wangzw]
pointed out, we still have to do refcounting either way, so there's no
performance improvement. If the user wants shared_ptr semantics and you give
them a bare pointer, they have to wrap it in another shared_ptr, adding
overhead. On the other hand, if the user doesn't want shared_ptr semantics,
there is no disadvantage to giving back a shared_ptr. The user can simply
delete the streams, then delete the filesystem, and get the same result as with
a bare pointer.
If leaks are a problem in a C++ program, there are tools like valgrind, ASAN,
and so forth. We use these tools a lot in Impala-- they work really well!
Throwing an exception in a delete() method is not really a very robust way of
detecting memory leaks. After all, the delete method itself may never be
called if the programmer makes a mistake.
Finally, to repeat my earlier argument, bare pointers are basically what the C
interface gives back. That is the C way-- manual allocation and de-allocation.
So if we're going to do the same here, it makes me wonder why we need a new
interface. I guess you could argue that it allows us to detect use-after-free,
but this is something that valgrind and ASAN do a great job detecting already,
and without runtime overhead in production.
> libhdfs3 should not expose exceptions in public C++ API
> -------------------------------------------------------
>
> Key: HDFS-7207
> URL: https://issues.apache.org/jira/browse/HDFS-7207
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Haohui Mai
> Assignee: Colin Patrick McCabe
> Priority: Blocker
> Attachments: HDFS-7207.001.patch
>
>
> There are three major disadvantages of exposing exceptions in the public API:
> * Exposing exceptions in public APIs forces the downstream users to be
> compiled with {{-fexceptions}}, which might be infeasible in many use cases.
> * It forces other bindings to properly handle all C++ exceptions, which might
> be infeasible especially when the binding is generated by tools like SWIG.
> * It forces the downstream users to properly handle all C++ exceptions, which
> can be cumbersome as in certain cases it will lead to undefined behavior
> (e.g., throwing an exception in a destructor is undefined.)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)