Re: [PATCH] Add a new log format variable "%p" that spits out the sanitized request path

Willy Tarreau Tue, 28 Apr 2015 01:02:53 -0700

Hi again Andrew,

I've spent some time on it. Good work. I'm seeing a bug in %HV, it doesn't
always work. I like your idea to convert missing versions to "HTTP/1.0",
but I think that in order to be even more accurate, we should report
"HTTP/0.9" to translate what was actually seen in the logs, what do you
think ?


One comment prior to the bug itself, I found %HR confusing because it
designates a request (and is even mapped to LOG_HTTP_REQUEST), but what
we name a request all over the code and doc is the association between
the method, the uri and the version. Since it takes only the URI, I'd
rather call it LOG_HTTP_URI and %HU. Maybe later we'll even decide to
alias %HR to %r, I don't know. I tend to be careful to always plan for
the future because we have accumulated a lot of misdesigns initially
due to the narrower scope of the project, that we're still carrying
today. Concerning "%HP" you can mention in the doc "(path)" since it's
the name we use in other places for the same entity. That way it will
be consistent.

Now concerning the bug, it's not very important but it does exist :

> +     case LOG_FMT_HTTP_VERSION: // %HV
> +       uri = txn->uri ? txn->uri : "<BADREQ>";
> +       if (tmp->options && LOG_OPT_QUOTE)
> +         LOGCHAR('"');
> +
> +       end = uri + strlen(uri);
> +       spc = uri;
> +       // look for the last whitespace character
> +       while (uri < end) {
> +         if (HTTP_IS_SPHT(*uri)) {
> +           spc = ++uri;
> +         } else {
> +           uri++;
> +         }
> +       }
> +
> +       // "Simple" HTTP request per RFC1945
> +       // Could technically be HTTP0.9 but we return HTTP/1.0 in the response
> +       // so mirror that here.
> +       if (spc == (end-1)) {
> +         chunk.str = "HTTP/1.0";
> +         chunk.len = strlen("HTTP/1.0");
> +       } else {
> +         chunk.str = spc;
> +         chunk.len = uri - spc;
> +       }

If you send "GET /" it says "HTTP/1.0", if you send "GET /foo", it says "/foo"
and if you send crap it says "<BADREQ>". Technically speaking we're not looking
for the last space but what follows the second (series of) spaces. Thus I would
simply keep the same parsing method as for the other parts and do it the same
way :

+                               // look for the first whitespace character
+                               while (uri < end && !HTTP_IS_SPHT(*uri))
+                                       uri++;
+
+                               // keep advancing past multiple spaces
+                               while (uri < end && HTTP_IS_SPHT(*uri))
+                                       uri++;
+
+                               // look for the next whitespace character
+                               while (uri < end && !HTTP_IS_SPHT(*uri))
+                                       uri++;
+
+                               // keep advancing past multiple spaces
+                               while (uri < end && HTTP_IS_SPHT(*uri))
+                                       uri++;

Then you log between uri and end and you're done.

Don't be afraid of duplicating a few lines like this, code duplication
is what tells us it's about time to create new functions. We'll probably
later come up with something like skip_word() and skip_spaces() to make
the code cleaner.

If you think you won't have the time to roll out another patch soon,
I can make the changes myself, just tell it to me. It's just that I
don't want to step on your feet.

Thanks!
Willy

Re: [PATCH] Add a new log format variable "%p" that spits out the sanitized request path

Reply via email to