Re: Patch to be tolerant of whitespace in URLs

John Tytgat Sat, 24 May 2008 16:36:25 -0700

In message <[EMAIL PROTECTED]>
          Philip Boulain <[EMAIL PROTECTED]> wrote:


> The attached patch makes url_normalize take care of whitespace in a
> fairly useful way, consistent with other browsers:
> [...]

I can't judge for the overall correctness but a couple of suggestions:

> +     /* encode any remaining (internal) whitespace */
> +     for (i = 0; i < len; i++) {
> +             if(isspace((*result)[i])) {
> +                     /* snprintf is all too keen to write damn nulls */
> +                     char esc[4];
> +                     char space = (*result)[i];
> +                     memmove((*result) + i + 2, (*result) + i, 1 + len - i);
> +                     len += 2;
> +                     snprintf(esc, 4, "%%%02hhx", space);
> +                     strncpy((*result) + i, esc, 3);
> +             }
> +     }

You go through a great pain to use snprintf().  Why not something like (not
tested):

+       /* encode any remaining (internal) whitespace */
+       for (i = 0; i < len; i++) {
+               if(isspace((*result)[i])) {
+                       char space = (*result)[i];
+                       memmove((*result) + i + 2, (*result) + i, 1 + len - i);
+                       (*result)[  i] = '%'
+                       (*result)[++i] = digit2lowcase_hex(space >> 4);
+                       (*result)[++i] = digit2lowcase_hex(space & 0xF);
+                       len += 2;
+               }
+       }

And a suitable digit2lowcase_hex():

static char digit2lowcase_hex(char digit)
{
  return (digit <= '9') ? (digit - 0)"0123456789" : (digit - 10)"abcdef";
}

Algorithmically it would be better to start from the end of *result buffer
as that would avoid calling memmove() but that perhaps stretching things a
bit. ;-)

It might also enhance the code quality if we would use a temporary variable
holding *result and use that.

> -     for (i = 0; (unsigned)i != len; i++) {
> +     for (i = 0; (size_t)i + 2 < len; i++) {
>               if ((*result)[i] != '%')
>                       continue;
>               c = tolower((*result)[i + 1]);

Wouldn't it be better to use size_t for i from the start ?

John.
-- 
John Tytgat
[EMAIL PROTECTED]

Re: Patch to be tolerant of whitespace in URLs

Reply via email to