Hi Willy - Apologies if this comes through multiple times; I'm having some mail difficulties.
On Tue, Apr 7, 2015 at 7:11 PM, Willy Tarreau <[email protected]> wrote: > Just a question, did you find any benefit in doing it with a new tag > compared to %[path] ? It may just be a matter of convenience, I'm just > wondering. When I attempt to use %[path] in a log-format directive, I get the following errors: parsing [/etc/haproxy/haproxy.cfg:83] : 'log-format' : sample fetch <path> may not be reliably used here because it needs 'HTTP request headers' which is not available here. Limited testing confirms that the path is not printed in the log when %[path] is included in log-format either. Perhaps I'm missing a configuration option that allows the use of that sample fetch in the logs? > Other comments on the code below : > >> +char* strip_uri_params(char *str) { >> + int spaces = 0; >> + int end = strlen(str); >> + >> + int i; >> + int path_end = end; >> + for (i = 0; i < end; i++) { >> + if (str[i] == ' ' && spaces == 0) { >> + spaces++; >> + } else if (str[i] == '?' || (str[i] == ' ' && spaces > 0)) { >> + path_end = i; >> + break; >> + } >> + } > > What's the purpose of counting spaces to stop at the second one ? You > cannot not have any in the path, so I'm a bit puzzled. It does seem counter-intuitive, but the uri variable at src/log.c#1546 is actually the full HTTP request line. Consider the following requests: GET /foo HTTP/1.1 GET /foo?bar HTTP/1.1 They should both produce an identical path (as the only difference is the query string), but they will not unless we strip the line at either the first question mark *or* the second space. Stripping at only the first question mark would produce logs like: GET /foo HTTP/1.1 GET /foo ...which is undesirable. > Here I don't understand, you seem to be doing malloc+strncpy+strlen+free > just to get an integer which is the position of the question mark in the > chain that is determined very earély in your strip function, is that it ? > If so, that's totally overkill and not even logical! > > Thanks, > Willy > You're right - this patch was not the most efficient or elegant. Chalk it up to not fully understanding the logging system, and my general lack of knowledge in C. I've included another patch that I think solves the problem much more elegantly, building on what you suggested. I've also attached an updated patch file in case my mail client messes up tabs/spaces or something. Thank you so much! - Andrew Hayworth >From 8cda7475e6b456636f61c48c2132ecf32f4c23b1 Mon Sep 17 00:00:00 2001 From: Andrew Hayworth <[email protected]> Date: Tue, 7 Apr 2015 21:42:53 +0000 Subject: [PATCH] Add a new log format variable "%p" that spits out the sanitized request path It's often undesirable to log query params - and in some cases, it can create legal compliance problems. This commit adds a new log format variable that logs the HTTP verb and the path requested sans query string (and additionally ommitting the protocol). For example, the following HTTP request line: GET /foo?bar=baz HTTP/1.1 becomes: GET /foo with this log format variable. --- include/types/log.h | 1 + src/log.c | 24 ++++++++++++++++++++++++ 2 files changed, 25 insertions(+) diff --git a/include/types/log.h b/include/types/log.h index c7e47ea..3205ce6 100644 --- a/include/types/log.h +++ b/include/types/log.h @@ -90,6 +90,7 @@ enum { LOG_FMT_HDRREQUESTLIST, LOG_FMT_HDRRESPONSLIST, LOG_FMT_REQ, + LOG_FMT_PATH, LOG_FMT_HOSTNAME, LOG_FMT_UNIQUEID, LOG_FMT_SSL_CIPHER, diff --git a/src/log.c b/src/log.c index 1a5ad25..af89e00 100644 --- a/src/log.c +++ b/src/log.c @@ -108,6 +108,7 @@ static const struct logformat_type logformat_keywords[] = { { "hs", LOG_FMT_HDRRESPONS, PR_MODE_TCP, LW_RSPHDR, NULL }, /* header response */ { "hsl", LOG_FMT_HDRRESPONSLIST, PR_MODE_TCP, LW_RSPHDR, NULL }, /* header response list */ { "ms", LOG_FMT_MS, PR_MODE_TCP, LW_INIT, NULL }, /* accept date millisecond */ + { "p", LOG_FMT_PATH, PR_MODE_HTTP, LW_REQ, NULL }, /* path */ { "pid", LOG_FMT_PID, PR_MODE_TCP, LW_INIT, NULL }, /* log pid */ { "r", LOG_FMT_REQ, PR_MODE_HTTP, LW_REQ, NULL }, /* request */ { "rc", LOG_FMT_RETRIES, PR_MODE_TCP, LW_BYTES, NULL }, /* retries */ @@ -1539,6 +1540,29 @@ int build_logline(struct session *s, char *dst, size_t maxsize, struct list *lis last_isspace = 0; break; + case LOG_FMT_PATH: // %p + if (tmp->options & LOG_OPT_QUOTE) + LOGCHAR('"'); + uri = txn->uri ? txn->uri : "<BADREQ>"; + ret = encode_string(tmplog, dst + maxsize, + '#', url_encode_map, uri); + if (ret == NULL || *ret != '\0') + goto out; + + // Cut off request line at first occurrence of '?' which signals the beginning of + // request params (and end of path). If no params are present, cut off at last space + // which otherwise signals the end of the path. + uri = strchr(tmplog, '?'); + if (uri == NULL) { + uri = strrchr(tmplog, ' '); + } + tmplog = uri ? uri : ret; + + if (tmp->options & LOG_OPT_QUOTE) + LOGCHAR('"'); + last_isspace = 0; + break; + case LOG_FMT_PID: // %pid if (tmp->options & LOG_OPT_HEXA) { iret = snprintf(tmplog, dst + maxsize - tmplog, "%04X", pid); -- 2.1.3
0001-Add-a-new-log-format-variable-p-that-spits-out-the-s.patch
Description: Binary data

