[
https://issues.apache.org/jira/browse/HADOOP-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027288#comment-14027288
]
Colin Patrick McCabe commented on HADOOP-10389:
-----------------------------------------------
bq. What make me concerned is that the code has to bring in a lot more
dependency in plain C, which has a high cost on maintenance
Currently, the libraries we depend on are: {{libuv}}, for portability
primitives, {{protobuf-c}}, for protobuf functionality, {{expat}}, for XML
parsing, and {{liburiparser}}, for parsing URIs. None of that functionality is
provided by the C++ standard library, so your statement is false.
bq. For example, this patch at least contains implementation of linked list,
splay tress, hash tables, and rb trees. There are a lot of overheads on
implementing, reviewing and testing the code.
A lot of this code is not new. For example, we were using {{tree.h}} (which
implements splay trees and rb trees), previously in libhdfs. The maintenance
burden was not high. In fact, it was zero, because we never had to fix a bug
in {{tree.h}}. So once again, your statement is just false.
{{htable.c}} got a review because it is new code. I would hardly call
reviewing new code a "maintenance burden." And anyway, there is a standard C
way to use hash tables... the {{hcreate_r}}, {{hsearch_r}}, and {{hdestroy}}
functions. We would like to use the standard way, but Windows doesn't
implement these functions.
bq. For example, do you considering supporting filenames in unicode? That way I
think libicu might need to be brought into the picture.
First of all, the question of whether we should use libicu is independent of
the question of whether we should use C\+\+. libicu has a C interface, and the
standard C\+\+ libraries and runtime don't provide any unicode functionality
beyond what the standard C libraries provide.
Second of all, I see no reason to use libicu. All the strings we are dealing
with are UTF-8 supplied to and from protobuf. This means that they are
null-terminated and can be printed and handled with existing string functions.
libicu might come into the picture if we wanted to start normalizing unicode
strings or using wide character strings. But we don't need or want to do that.
bq. It looks to me that it is much more compelling to implement the code in a
more modern language, say, c++11, where much of the headache right now is taken
away by a mature standard library.
C++ first came on the scene in 1983. That is 31 years ago. C++ may be a lot
of things, but "modern" isn't one of them. I was a C++ programmer for 10
years. I know the language about as well as anyone can. I specifically chose
C for this project because of a few things.
Firstly, the challenge of maintaining a consistent C++ coding style is very,
very large. This is true even when everyone is a professional C++ programmer
working under the same roof. For a project like Hadoop, where C/C++ is not
everyone's first language, the challenge is just unsupportable. The C++
learning curve is just much higher than C. You have to know everything you
have to know for C, plus a lot of very tricky things that are unique to C++.
There are a lot of contentious issues in the community like use exceptions, or
don't use exceptions? Use global constructors, or don't use global
constructors? Use boost, or don't use boost? Use C++0x / C++11 / C++14 or use
some older standard? Use runtime type information ({{dynamic_cast}},
{{typeof}}), or don't use runtime type information? Operator overloading, or
no operator overloading?
There are reasonable arguments for each of these positions. For example,
exceptions harm performance (because of the need to maintain data to do stack
unwinding. See here:
http://preshing.com/20110807/the-cost-of-enabling-exception-handling/. That's
just if you don't use them... if you do use them, exceptions turn out to be a
lot slower than return codes. They also can make code difficult to follow.
C++ doesn't have checked exceptions, so you can never really know what any
function will throw. For this reason, some fairly smart people at Google have
decided to ban exceptions from their coding standard. This, in turn, means
that it's difficult for libraries to throw exceptions, since open source
projects using the Google Coding standard (and there are a lot of them) can't
deal with exceptions. Of course, without exceptions, certain things in C++ are
very hard to do. (By the way, I'm not interested in having the argument
for/against exceptions here, just in noting that there is huge fragmentation
here and reasonable people on both sides.)
A similar story could be told about all the other choices. The net effect is
that we have to police a very large set of arbitrary style decisions that just
wouldn't come up at all if we just used C.
C\+\+ library APIs have binary compatibility issues. A lot of them. Just take
a look at
http://techbase.kde.org/Policies/Binary_Compatibility_Issues_With_C++. Again,
how are we going to ensure that everyone follows these rules? It's nearly
impossible. Considering the number of issues we've had maintaining API
compatibility in Java, with Java's much simpler rules, this is just a
deal-breaker. Whereas with C, the rules for maintaining binary compatibility
are very simple.
C is available on every platform out there, even AIX. C\+\+11 is only
available on a subset of those platforms. This is another advantage of plain
old C.
But more importantly, it's easy to bind other higher-level languages to C than
it is to C\+\+. For example, in Python you can use ctypes to easily wrap a C
library. https://docs.python.org/2/library/ctypes.html. Do you want to use
ctypes with C\+\+? Then you're out of luck.
http://stackoverflow.com/questions/1615813/how-to-use-c-classes-with-ctypes. A
similar story could be told about golang, and most other high-level languages.
You have to write a lot of boilerplate to wrap C\+\+, and almost none for C.
If we were writing a new daemon or something, then I might consider C\+\+, even
C\+\+11. Yes, C\+\+11 added some good things. {{auto}} was a good idea
(borrowed from golang or someplace), and move constructors are nice. But none
of it addresses the problems above, and all of it just adds more complexity for
people to master. What we are writing is just a client, and it's not that
thick. Especially the YARN client, which just makes some RPCs and that's it.
And the code is nearly done.
I'm not interested in having a language flamewar here. C has some advantages,
and C\+\+ has another set. For this particular project, the former outweigh
the latter. I'm very familiar with C\+\+ and I don't need a lecture on its
advantages, having been a user for a decade.
If you are interested in writing a C++ interface for libhdfs or libyarn, then
by all means do that. I think this interface should be in a header file only,
to avoid the binary compatibility issues I mentioned earlier. Since the header
file would be compiled by each client, we would be free to change it whenever
we liked without worrying about binary compatibility.
> Native RPCv9 client
> -------------------
>
> Key: HADOOP-10389
> URL: https://issues.apache.org/jira/browse/HADOOP-10389
> Project: Hadoop Common
> Issue Type: Sub-task
> Affects Versions: HADOOP-10388
> Reporter: Binglin Chang
> Assignee: Colin Patrick McCabe
> Attachments: HADOOP-10388.001.patch, HADOOP-10389.002.patch,
> HADOOP-10389.004.patch, HADOOP-10389.005.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.2#6252)