[jira] [Commented] (HDFS-3920) libwebdhfs code cleanup: string processing and using strerror consistently to handle all errors

Andy Isaacson (JIRA) Thu, 04 Oct 2012 17:53:50 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-3920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13469928#comment-13469928
 ]


Andy Isaacson commented on HDFS-3920:
-------------------------------------

Overall, looking good.  A few more improvements:
{code}
-        rbuffer->content = realloc(rbuffer->content, rbuffer->offset + size * 
nmemb + 1);
+        rbuffer->content = realloc(rbuffer->content,
+                                   rbuffer->offset + size * nmemb + 1);
         if (rbuffer->content == NULL) {
{code}
At this point we've leaked the previous {{rbuffer->content}} because realloc 
does not free on error.

Better to do something like
{code}
void *tmp = realloc(buf->content, ...);
if (tmp == NULL) {
   ...
   return NULL;
}
buf->content = tmp;
{code}

{code}
+    curl = curl_easy_init();
+    if (!curl) {
+        errno = ENOMEM;     // curl_easy_init does return error code,
+                            // and most of its errors are caused by malloc()
{code}
Maybe the comment means to say "does *not* return error code"?  It doesn't make 
very much sense to me currently.

{code}
+    if(followloc == YES) {
{code}
Style nit: put a space between {{if}} and the (.  (Several cases of this 
problem.)

{code}
+    offset = snprintf(url, length + 1, "%s%s:%d%s%s?op=%s",
+                      protocol, host, nnPort, prefix, path, op);
+    for (i = 0; i < paraNum; i++) {
+        if (!paraNames[i] || !paraValues[i]) {
+            continue;
+        }
+        offset += snprintf(url + offset, length - offset + 1,
+                           "&%s=%s", paraNames[i], paraValues[i]);
{code}
After calling snprintf, you must verify that there is still room left in the 
buffer and error out if not.  If {{length - offset + 1}} becomes negative, you 
might underflow and write past the end of the buffer.

{code}
+#define NUM_OF_PERMISSION_BITS 3
+#define NUM_OF_SHORT_VALUE_BITS 5
+#define NUM_OF_LONG_VALUE_BITS 15
{code}

Based on reading the code, I think these are intended to be the number of 
characters to allocate to hold a "%d" formatting of a relevant type. The name 
should say so -- NUM_PERM_CHAR, NUM_SHORT_CHAR, NUM_LONG_CHAR perhaps?  And 
since they are *always* used with +1 for the NUL, you may as well just include 
that here, in which case calling it PERM_STR_LEN, SHORT_STR_LEN, LONG_STR_LEN 
might be better.  Also, add a comment explaining where the values come from.

If my understanding is correct, then the right value for LONG is 21, not 16, 
since 2^64 is 20 decimal digits.

Combining all of that, this becomes 
{code}
#define PERM_STR_LEN 4 // "644" + one byte for NUL
#define SHORT_STR_LEN 6 // 65535 + NUL
#define LONG_STR_LEN 21 // 2^64-1 = 18446744073709551615 + NUL
{code}


{code}
+    if(!header || header[0] == '\0' ||
+                  strncmp(header, "HTTP/", strlen("HTTP/"))) {
{code}
Simplify this to {{if (!header || strncmp(header, ...}} since strncmp will 
handle a zero-length-string just fine.

{code}
-    if(strcmp(result,"0")) {                 //Content-Length should be equal 
to 0
+    if(strncmp(result, "0", 1)) {
{code}
This will do the wrong thing if result is "01".  I think the strcmp version is 
more correct.

{code}
+    const char *responseCode = "307 TEMPORARY_REDIRECT";
+    if(!header || header[0] == '\0' || strncmp(header,"HTTP/", 5)) {
...
+    }
+    if(!(tempHeader = strstr(header,responseCode))) {
{code}
# no need to special case the zero-length-string case
# strstr is not right here, it would succeed if given {{"HTTP/1.0 foobar baz 
307 TEMPORARY_REDIRECT"}} which should fail. Better to {{strspn(" \t"}} then 
strncmp.
# space after comma.

{code}
+    if(header == '\0' || strncmp(header,"HTTP/", strlen("HTTP/"))) {
{code}
should be {{if ((header == NULL || strncmp...}}.  Do not confuse the pointer 
{{NULL}} with the char NUL {{'\0'}}.

{code}
+    if(!(strstr(header,responseCode1) && strstr(header, responseCode2))) {
{code}
More inappropriate use of strstr.

{code}
+    if(arraylen > 0) {
+        fileStat = realloc(fileStat, sizeof(hdfsFileInfo) * arraylen);
+    }
{code}
# need to check for realloc failing
# similar to above realloc comment, shouldn't reassign to fileStat without a 
temporary.
{code}
+static hdfsFileInfo *json_parse_array(json_t *jobj, hdfsFileInfo *fileStat,
+                                      int *numEntries, const char *operation,
+                                      int printError)
{code}
The name {{json_parse_array}} is too generic for something that takes a 
{{hdfsFileInfo}} argument.  Perhaps {{parse_json_fileinfo}} or similar.

The function has a very tricky definition -- it reallocates {{fileStat}} and 
returns the new buffer.  That deserves a doc comment at least, or a redesign 
where the argument is {{hdfsFileInfo **fileStat}}.
{code}
+    void *iter = json_object_iter(jobj);
+    while(iter)  {
+        key = json_object_iter_key(iter);
+        value = json_object_iter_value(iter);
+        switch (json_typeof(value)) {
+            case JSON_INTEGER:
+                if(!strcmp(key, "accessTime")) {
+                    fileStat->mLastAccess = json_integer_value(value) / 1000;
{code}
It's very confusing to have the {{strcmp(key}} inside the {{switch(type)}} code 
block.  This makes the decision flow very hard to follow.

Unless the units are obvious, magic numbers like {{/ 1000}} need a comment 
explaining why the value is being converted (from milliseconds to seconds?  or 
microseconds?).  Something like {{// json field contains time in milliseconds, 
hdfsFileInfo is counted in microseconds}} or similar.
{code}
+            default:
+                free(fileStat);
+                fileStat = NULL;
+        }
{code}
If we hit this default, the next loop through will segfault.  Better to exit 
the loop explicitly with a return or a test variable or a goto. If this is 
handling some quirk of {{json_object_iter_value}} then a comment is needed.

{code}
+                "file creation/appending, <%d>:%s.\n",
{code}
Some of these are missing a space after the {{:}}. I don't really care what 
format you use, but please be consistent.
{code}
+        newWorkingDir = normalizePath(newWorkingDir);
{code}
This leaks the old {{newWorkingDir}}.  But see below for an alternate design.
{code}
+/** Replace "//" with "/" in path */
+static char *normalizePath(char *path)
+{
+    const char *delimiter = "/";
...
+    token = strtok_r(path, delimiter, &savePtr);
{code}
This function would be much simpler if it did not generate the leading {{/}} 
(which it isn't documented to do) and modified the path in-place using an 
explicit bytewise loop rather than using {{strtok}}.  Something like
{code}
for (i=j=sawslash=0; i<pathlen; i++) {
   if (path[i] == '/') {
       if (sawslash) {
           continue;
       } else {
          sawslash = 1;
       }
    } else {
       sawslash = 0;
       path[j++] = path[i];
    }
}
path[j] = '\0';
{code}

Overall, very good cleanup.  Take care of the above and let's get this checked 
in and continue to make progress.

There's still quite a bit of hairy code in libwebhdfs that needs careful review 
and testing.  I'm still concerned about how much explicit string manipulation 
in C is being done; there are many places with fragile {{strncat}} and 
{{snprintf}} loops with unchecked potentially underflowable computations.  Keep 
in mind that this code will potentially be exposed to hostile network traffic 
and a single byte buffer overwrite gives the attacker code execution with high 
likelihood.  This is native C, not Java, and nearly every buffer offset bug is 
a critical code execution bug.  It would be much easier to see a path to 
confidence in this code if there were less ad-hoc string manipulation.
                
> libwebdhfs code cleanup: string processing and using strerror consistently to 
> handle all errors
> -----------------------------------------------------------------------------------------------
>
>                 Key: HDFS-3920
>                 URL: https://issues.apache.org/jira/browse/HDFS-3920
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Jing Zhao
>            Assignee: Jing Zhao
>         Attachments: HDFS-3920-001.patch, HDFS-3920-001.patch, 
> HDFS-3920-002.patch, HDFS-3920-003.patch, HDFS-3920-004.patch, 
> HDFS-3920-005.patch
>
>
> 1. Clean up code for string processing;
> 2. Using strerror consistently for error handling;
> 3. Use sprintf to replace decToOctal
> 4. other issues requiring fixing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-3920) libwebdhfs code cleanup: string processing and using strerror consistently to handle all errors

Reply via email to