[
https://issues.apache.org/jira/browse/HADOOP-10389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028383#comment-14028383
]
Colin Patrick McCabe commented on HADOOP-10389:
-----------------------------------------------
bq. Just a random grep, here is a call to vsprintf but not vsnprintf:
The reason why that is a {{vsprintf}} is that we use {{vsnprintf}} above with
the same format string to calculate the length of the buffer we'll need.
bq. Additionally, the caller can easily trigger a buffer overflow by passing
"%s" as a format string. Alternatively, using stringstream will never have this
problem. Note that the code uses the c++ runtime anyway, why not just writing
the code in a type safe way and let the type system do the proof for you?
The caller can't "easily trigger a buffer overflow" in the code above. The
reason is because if the caller passes {{%s}} followed by a string, we will
compute the length of that string as {{fmt_len}}. Let's follow through the
code.
{code}
fmt_len = vsnprintf(test_buf, sizeof(test_buf), fmt, ap);
if (fmt_len < 0) {
ierr = (struct hadoop_err*)&HADOOP_VSNPRINTF_ERR;
goto done;
}
class = errno_to_class(code);
class_len = strlen(class);
msg_len = class_len + SUFFIX_LEN + fmt_len + 1;
err->msg = malloc(msg_len);
if (!err->msg) {
ierr = (struct hadoop_err*)&HADOOP_OOM_ERR;
goto done;
}
strcpy(err->msg, class);
strcpy(err->msg + class_len, SUFFIX);
vsprintf(err->msg + class_len + SUFFIX_LEN, fmt, ap2);
{code}
So you can see that {{msg_len}} will depend on {{fmt_len}}. If the length of
the string the caller supplies to {{%s}} increases, the length of the buffer we
allocate will increase.
bq. valgrind is not a panacea. Valgrind is a dynamic tool where it spots the
leak only when the code runs into the buggy path. Many leaks happen in the
error-handling path, and I'm skeptical that (1) the unit tests are able to
cover all of them, (2) you'll ever be able to run valgrind in production
considering its overheads (10x~100x). Alternatively, the issue is largely
addressed in c++ compile time if the code passes smart pointers with discipline.
C++ is not a panacea. I have worked on large C++ projects in the past. They
have memory leaks too, sometimes quite serious ones. All of them ran valgrind,
thread sanitizer, and other dynamic tools to spot the leaks. Once you get a
complex object ownership structure with {{shared_ptr}}, {{unique_ptr}}, and
{{auto_ptr}}, plus passing bare references and pointers... it's easy to make
mistakes.
bq. In short, they are best-effort approaches to address the common but
critical defects I've raised. With proper uses, a modern language like C++ is
able to provide much higher assurance (type checked proofs) in security and
reliability. Why not embracing these tools instead of putting heavy dependency
and responsibility on code review to catch these bugs?
So far, we haven't found any defects. And C++ doesn't provide "type checked
proofs" of anything, since it is an unsafe language (much like C, which it's
based on), which supports arbitrary casting, pointer aliasing, arbitrary code
execution, and so on. If you wanted provable correctness, you might try
something like {{SML}} or {{Haskell}} plus a theorem prover like {{Coq}}. But
what does this have to do with a native client?
bq. You have already made your point on your c++ cred. Lets continue the
discussion in the right tone.
Suresh, I apologize if this came across as rude. My intention was just to
point out that I have considered the merits of C versus C\+\+. I have
experience with both. It seemed like there was an assumption that I was not
familiar with C\+\+.
I feel somewhat frustrated, because I don't think JIRA is a suitable forum for
programming language advocacy. This JIRA doesn't have anything to do with C
versus C++. It is about the RPC code, which was already reviewed and
committed. There are plenty of places on the web advocating a particular
programming language-- does this have to be one of them? If I opened a JIRA
about switching Hadoop to Scala, would the discussion continue "in the right
tone"? If I pointed out a bunch of stuff that would be "fixed by scala" (that
wasn't actually buggy), what would the tone be then? (Note that I have nothing
against Scala-- it's a fine language.)
> Native RPCv9 client
> -------------------
>
> Key: HADOOP-10389
> URL: https://issues.apache.org/jira/browse/HADOOP-10389
> Project: Hadoop Common
> Issue Type: Sub-task
> Affects Versions: HADOOP-10388
> Reporter: Binglin Chang
> Assignee: Colin Patrick McCabe
> Attachments: HADOOP-10388.001.patch, HADOOP-10389.002.patch,
> HADOOP-10389.004.patch, HADOOP-10389.005.patch
>
>
--
This message was sent by Atlassian JIRA
(v6.2#6252)