[
https://issues.apache.org/jira/browse/HDFS-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13469928#comment-13469928
]
Andy Isaacson commented on HDFS-3920:
-------------------------------------
Overall, looking good. A few more improvements:
{code}
- rbuffer->content = realloc(rbuffer->content, rbuffer->offset + size *
nmemb + 1);
+ rbuffer->content = realloc(rbuffer->content,
+ rbuffer->offset + size * nmemb + 1);
if (rbuffer->content == NULL) {
{code}
At this point we've leaked the previous {{rbuffer->content}} because realloc
does not free on error.
Better to do something like
{code}
void *tmp = realloc(buf->content, ...);
if (tmp == NULL) {
...
return NULL;
}
buf->content = tmp;
{code}
{code}
+ curl = curl_easy_init();
+ if (!curl) {
+ errno = ENOMEM; // curl_easy_init does return error code,
+ // and most of its errors are caused by malloc()
{code}
Maybe the comment means to say "does *not* return error code"? It doesn't make
very much sense to me currently.
{code}
+ if(followloc == YES) {
{code}
Style nit: put a space between {{if}} and the (. (Several cases of this
problem.)
{code}
+ offset = snprintf(url, length + 1, "%s%s:%d%s%s?op=%s",
+ protocol, host, nnPort, prefix, path, op);
+ for (i = 0; i < paraNum; i++) {
+ if (!paraNames[i] || !paraValues[i]) {
+ continue;
+ }
+ offset += snprintf(url + offset, length - offset + 1,
+ "&%s=%s", paraNames[i], paraValues[i]);
{code}
After calling snprintf, you must verify that there is still room left in the
buffer and error out if not. If {{length - offset + 1}} becomes negative, you
might underflow and write past the end of the buffer.
{code}
+#define NUM_OF_PERMISSION_BITS 3
+#define NUM_OF_SHORT_VALUE_BITS 5
+#define NUM_OF_LONG_VALUE_BITS 15
{code}
Based on reading the code, I think these are intended to be the number of
characters to allocate to hold a "%d" formatting of a relevant type. The name
should say so -- NUM_PERM_CHAR, NUM_SHORT_CHAR, NUM_LONG_CHAR perhaps? And
since they are *always* used with +1 for the NUL, you may as well just include
that here, in which case calling it PERM_STR_LEN, SHORT_STR_LEN, LONG_STR_LEN
might be better. Also, add a comment explaining where the values come from.
If my understanding is correct, then the right value for LONG is 21, not 16,
since 2^64 is 20 decimal digits.
Combining all of that, this becomes
{code}
#define PERM_STR_LEN 4 // "644" + one byte for NUL
#define SHORT_STR_LEN 6 // 65535 + NUL
#define LONG_STR_LEN 21 // 2^64-1 = 18446744073709551615 + NUL
{code}
{code}
+ if(!header || header[0] == '\0' ||
+ strncmp(header, "HTTP/", strlen("HTTP/"))) {
{code}
Simplify this to {{if (!header || strncmp(header, ...}} since strncmp will
handle a zero-length-string just fine.
{code}
- if(strcmp(result,"0")) { //Content-Length should be equal
to 0
+ if(strncmp(result, "0", 1)) {
{code}
This will do the wrong thing if result is "01". I think the strcmp version is
more correct.
{code}
+ const char *responseCode = "307 TEMPORARY_REDIRECT";
+ if(!header || header[0] == '\0' || strncmp(header,"HTTP/", 5)) {
...
+ }
+ if(!(tempHeader = strstr(header,responseCode))) {
{code}
# no need to special case the zero-length-string case
# strstr is not right here, it would succeed if given {{"HTTP/1.0 foobar baz
307 TEMPORARY_REDIRECT"}} which should fail. Better to {{strspn(" \t"}} then
strncmp.
# space after comma.
{code}
+ if(header == '\0' || strncmp(header,"HTTP/", strlen("HTTP/"))) {
{code}
should be {{if ((header == NULL || strncmp...}}. Do not confuse the pointer
{{NULL}} with the char NUL {{'\0'}}.
{code}
+ if(!(strstr(header,responseCode1) && strstr(header, responseCode2))) {
{code}
More inappropriate use of strstr.
{code}
+ if(arraylen > 0) {
+ fileStat = realloc(fileStat, sizeof(hdfsFileInfo) * arraylen);
+ }
{code}
# need to check for realloc failing
# similar to above realloc comment, shouldn't reassign to fileStat without a
temporary.
{code}
+static hdfsFileInfo *json_parse_array(json_t *jobj, hdfsFileInfo *fileStat,
+ int *numEntries, const char *operation,
+ int printError)
{code}
The name {{json_parse_array}} is too generic for something that takes a
{{hdfsFileInfo}} argument. Perhaps {{parse_json_fileinfo}} or similar.
The function has a very tricky definition -- it reallocates {{fileStat}} and
returns the new buffer. That deserves a doc comment at least, or a redesign
where the argument is {{hdfsFileInfo **fileStat}}.
{code}
+ void *iter = json_object_iter(jobj);
+ while(iter) {
+ key = json_object_iter_key(iter);
+ value = json_object_iter_value(iter);
+ switch (json_typeof(value)) {
+ case JSON_INTEGER:
+ if(!strcmp(key, "accessTime")) {
+ fileStat->mLastAccess = json_integer_value(value) / 1000;
{code}
It's very confusing to have the {{strcmp(key}} inside the {{switch(type)}} code
block. This makes the decision flow very hard to follow.
Unless the units are obvious, magic numbers like {{/ 1000}} need a comment
explaining why the value is being converted (from milliseconds to seconds? or
microseconds?). Something like {{// json field contains time in milliseconds,
hdfsFileInfo is counted in microseconds}} or similar.
{code}
+ default:
+ free(fileStat);
+ fileStat = NULL;
+ }
{code}
If we hit this default, the next loop through will segfault. Better to exit
the loop explicitly with a return or a test variable or a goto. If this is
handling some quirk of {{json_object_iter_value}} then a comment is needed.
{code}
+ "file creation/appending, <%d>:%s.\n",
{code}
Some of these are missing a space after the {{:}}. I don't really care what
format you use, but please be consistent.
{code}
+ newWorkingDir = normalizePath(newWorkingDir);
{code}
This leaks the old {{newWorkingDir}}. But see below for an alternate design.
{code}
+/** Replace "//" with "/" in path */
+static char *normalizePath(char *path)
+{
+ const char *delimiter = "/";
...
+ token = strtok_r(path, delimiter, &savePtr);
{code}
This function would be much simpler if it did not generate the leading {{/}}
(which it isn't documented to do) and modified the path in-place using an
explicit bytewise loop rather than using {{strtok}}. Something like
{code}
for (i=j=sawslash=0; i<pathlen; i++) {
if (path[i] == '/') {
if (sawslash) {
continue;
} else {
sawslash = 1;
}
} else {
sawslash = 0;
path[j++] = path[i];
}
}
path[j] = '\0';
{code}
Overall, very good cleanup. Take care of the above and let's get this checked
in and continue to make progress.
There's still quite a bit of hairy code in libwebhdfs that needs careful review
and testing. I'm still concerned about how much explicit string manipulation
in C is being done; there are many places with fragile {{strncat}} and
{{snprintf}} loops with unchecked potentially underflowable computations. Keep
in mind that this code will potentially be exposed to hostile network traffic
and a single byte buffer overwrite gives the attacker code execution with high
likelihood. This is native C, not Java, and nearly every buffer offset bug is
a critical code execution bug. It would be much easier to see a path to
confidence in this code if there were less ad-hoc string manipulation.
> libwebdhfs code cleanup: string processing and using strerror consistently to
> handle all errors
> -----------------------------------------------------------------------------------------------
>
> Key: HDFS-3920
> URL: https://issues.apache.org/jira/browse/HDFS-3920
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Reporter: Jing Zhao
> Assignee: Jing Zhao
> Attachments: HDFS-3920-001.patch, HDFS-3920-001.patch,
> HDFS-3920-002.patch, HDFS-3920-003.patch, HDFS-3920-004.patch,
> HDFS-3920-005.patch
>
>
> 1. Clean up code for string processing;
> 2. Using strerror consistently for error handling;
> 3. Use sprintf to replace decToOctal
> 4. other issues requiring fixing.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira