Hello,
The attached patch was motivated by user complaints that “vi”,
“cat”, and even some more sophisticated log analysis tools have trouble
handling long access.log lines. Those long log lines result from long
URLs that do occur in the wild.
Squid places a 8192 character limit on the URL length but that limit
exceeds (a) some of the tool limits and (b) Squid's access.log buffer
limit (if some other fields are logged).
My solution was to honor the .precision setting in logformat field
specifications. You can use it with %ru or any other text field.
For example, the format code below limits logged URI size to the first
1000 characters.
logformat xsquid ... %rm %.1000ru %un ...
Squid access log line buffer cannot exceed 8192 characters. If you want
to preserve fields logged after the URL, your logged URL width limit
should be smaller than 8192 to leave space for other fields on the log
line.
There is no width limit by default.
Here is a possible commit message:
---------------------------------
Support maximum field width for string access.log fields.
Some standard command-line and some log processing tools have trouble
handling URLs or other logged fields exceeding 8KB in length. Moreover,
Squid violates its own log line format and truncates the entire log line
if, for example, the URL is 8KB long. By supporting .precision format
argument, we allow the administrator to specify logged URL size and
avoid these problems.
Limiting logged field width has no effect on traffic on the wire.
TODO: The name comes from the printf(3) "precision" format part. It may
be a good idea to rename our "precision" into max_width or similar,
especially if we do not support floating point precision logging.
TODO: Old code used chars to store user-configured field width and
precision. That does not work for URLs, headers, and other entries
longer than 256 characters. This patch changes the storage type to int.
The code should probably be polished further to remove unsigned->signed
conversions.
---------------------
Please review.
Thank you,
Alex.
Support maximum field width for string access.log fields.
Some standard command-line and some log processing tools have trouble handling
URLs or other logged fields exceeding 8KB in length. Moreover, Squid violates
its own log line format and truncates the entire log line if, for example, the
URL is 8KB long. By supporting .precision format argument, we allow the
administrator to specify logged URL size and avoid these problems.
Limiting logged field width has no effect on traffic on the wire.
TODO: The name comes from the printf(3) "precision" format part. It may be a
good idea to rename our "precision" into max_width or similar, especially if
we do not support floating point precision logging.
TODO: Old code used chars to store user-configured field width and precision.
That does not work for URLs, headers, and other entries longer than 256
characters. This patch changes the storage type to int. The code should
probably be polished further to remove unsigned->signed conversions.
=== modified file 'src/cf.data.pre'
--- src/cf.data.pre 2010-01-02 04:32:46 +0000
+++ src/cf.data.pre 2010-01-20 17:15:59 +0000
@@ -2462,7 +2462,7 @@
modifiers are usually not needed, but can be specified if an explicit
output format is desired.
- % ["|[|'|#] [-] [[0]width] [{argument}] formatcode
+ % ["|[|'|#] [-] [[0]width[.precision]] [{argument}] formatcode
" output in quoted string format
[ output in squid text log format as used by log_mime_hdrs
@@ -2470,8 +2470,10 @@
' output as-is
- left aligned
- width field width. If starting with 0 the
- output is zero padded
+ width minimum field width.
+ If starting with 0 the output is zero padded
+ precision maximum field width for string values.
+ Longer string values are truncated.
{arg} argument such as header name etc
Format codes:
=== modified file 'src/log/access_log.cc'
--- src/log/access_log.cc 2009-12-22 01:12:53 +0000
+++ src/log/access_log.cc 2010-01-20 17:15:59 +0000
@@ -466,8 +466,8 @@
} header;
char *timespec;
} data;
- unsigned char width;
- unsigned char precision;
+ unsigned int width;
+ unsigned int precision;
enum log_quote quote;
unsigned int left:1;
unsigned int space:1;
@@ -1213,13 +1213,15 @@
}
}
- if (fmt->width) {
- if (fmt->left)
- mb.Printf("%-*s", (int) fmt->width, out);
- else
- mb.Printf("%*s", (int) fmt->width, out);
- } else
- mb.append(out, strlen(out));
+ // enforce width limits if configured
+ const int minWidth = fmt->width ? static_cast<int>(fmt->width) : 0;
+ const int maxWidth = (fmt->precision && !doint && !dooff) ?
+ static_cast<int>(fmt->precision) : strlen(out);
+
+ if (fmt->left)
+ mb.Printf("%-*.*s", minWidth, maxWidth, out);
+ else
+ mb.Printf("%*.*s", minWidth, maxWidth, out);
} else {
mb.append("-", 1);
}