"chapter" on logs

Joshua Slive 8 Aug 2001 04:52:39 -0000

I have written a first draft of a new document which discusses
logfiles in Apache.  I did not directly use Rich Bowen's great logging
tutorial, but I borrowed ideas from it extensively (and the same for the
older tutorial from apacheweek).


I'd appreciate feedback, including, but not limited to:

1. Is this appropriate for the docs?
2. Is it at the right level?
3. What is missing?
4. What is overdone?
5. Should I add sections for pidfile, rewritelog, scriptlog (I'm leaning
towards yes)
6. Should I try to add more general background stuff like "How do I tell
how many PEOPLE visited my website?"  (I'm leaning towards no)

Included inline below, and available in rendered form at
http://garibaldi.commerce.ubc.ca:8080/ap13/htdocs/manual/logs.html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
<TITLE>Log Files and Status Reporting in Apache</TITLE>
</HEAD>

<!-- Background white, links blue (unvisited), navy (visited), red (active) -->
<BODY
 BGCOLOR="#FFFFFF"
 TEXT="#000000"
 LINK="#0000FF"
 VLINK="#000080"
 ALINK="#FF0000"
>
<!--#include virtual="header.html" -->
<h1 align="center">Log Files and Status Reporting</h1>

<p>In order to effectively manage a web server, it is necessary to get
feedback about the activity and performance of the server as well as
any problems that may be occuring.  The Apache HTTP Server provides
very comprehensive and flexible logging capabilities.  This document
describes how to configure the various log files, and how to
understand what the logs contain.</p>

<ul>
<li><a href="#security">Security Warning</a></li>
<li><a href="#errorlog">Error Log</a></li>
<li><a href="#accesslog">Access Log</a>
  <ul>
    <li><a href="#common">Common Log Format</a></li>
    <li><a href="#combined">Combined Log Format</a></li>
    <li><a href="#multiple">Multiple Access Logs</a></li>
    <li><a href="#conditional">Conditional Logging</a></li>
  </ul></li>
<li><a href="#rotation">Log Rotation</a></li>
<li><a href="#piped">Piped Logs</a></li>
<li><a href="#virtualhosts">VirtualHosts</a>
</ul>

<hr>

<h2><a name="security">Security Warning</a></h2>

<p>Anyone who can write to the directory where Apache is writing a
log file can almost certainly gain access to the uid that the server is
started as, which is normally root.  Do <EM>NOT</EM> give people write
access to the directory the logs are stored in without being aware of
the consequences; see the <A HREF="misc/security_tips.html">security tips</A>
document for details.</p>

<p>In addition, log files may contain information supplied directly
by the client, without escaping.  Therefore, it is possible for
malicious clients to insert control-characters in the log files, so
care must be taken in dealing with raw logs.</p>

<h2><a name="errorlog">Error Log</a></h2>

<table border="1">
<tr><td valign="top">
<strong>Related Directives</strong><br><br>

<a href="mod/core.html#errorlog">ErrorLog</a><br>
<a href="mod/core.html#loglevel">LogLevel</a>
</td></tr></table>

<p>The server error log, the location of which is set by the <a
href="mod/core.html#errorlog">ErrorLog</a> directive, is the most
important log file.  This is the place where Apache HTTPD will send
diagnostic information and record any errors that it encouters in
processing requests.  It is the first place to look when a problem
occurs with starting the server or with the operation of the server,
since it will often contain details of what went wrong and how to fix
it.</p>

<p>The error log is usually written to a file (typically
<code>error_log</code> on unix systems and <code>error.log</code> on
Windows and OS/2).  However, on unix systems it is also possible to
have the server send errors to the <code>syslog</code> or pipe them
through a program (see <a href="#rotation">Log Rotation</a>
below).</p>

<p>The format of the error log is relatively free-form and
descriptive.  However, there is certain information that is contained
in most error log entries.  For example, here is a typical message.</p>

<blockquote><code>
[Wed Oct 11 14:32:52 2000] [error] [client 127.0.0.1] client denied by server 
configuration: /export/home/live/ap/htdocs/test
</code></blockquote>

<p>The first item in the log entry is the date and time of the
message.  The second entry lists the severity of the error being
reported. The <a href="mod/core.html#loglevel">LogLevel</a> directive
is used to control the types of errors that are sent to the error log
by restricting the severity level.  The third entry gives the IP
address of the client which generated the error.  Beyond that is the
message itself, which in this case indicates that the server has been
configured to deny the client access and gives the file-system path of
the requested document.</p>

<p>A very wide variety of different messages can appear in the error
log.  Most look similar to the example above.  However, the error log
will also contain debugging output from CGI scripts.  Any information
written to <code>stderr</code> by a CGI script will be copied directly
to the error log.</p>

<p>It is not possible to customize the error log by adding or removing
information.  However, error log entries dealing with particular
requests have corresponding entries in the <a href="accesslog">access
log</a>.  For example, the above example entry corresponds to an
access log entry with status code 403.  So it is often possible to
customize the access log in order to get more information about error
conditions.</p>

<p>During testing, it is often useful to continuously monitor the
error log for any problems.  On unix systems, this is easily
accomplished using:</p>
<blockquote><code>
tail -f error_log
</code></blockquote>

<p>Other operating systems may have similar commands.</p>


<h2><a name="accesslog">Access Log</a></h2>

<table border=1><tr><td valign="top">
<strong>Related Modules</strong><br><br>

<a href="mod/mod_log_config.html">mod_log_config</a><br>

</td><td valign="top">
<strong>Related Directives</strong><br><br>

<a href="mod/mod_log_config.html#customlog">CustomLog</a><br>
<a href="mod/mod_log_config.html#logformat">LogFormat</a><br>

</td></tr></table>

<p>The server access log records all requests processed by the server.
The location of the access log, as well as its contents are controlled
by the <a href="mod/mod_log_config.html#customlog">CustomLog</a>
directive.  The <a
href="mod/mod_log_config.html#logformat">LogFormat</a> directive can
be used to simplify the selection of the contents of the logs.
This section describes how to configure the server to record
information in the access log.</p>

<p>Of course, storing the information in the access log is only the
start of log management.  The next step is to analyze this information
to produce useful statistics.  Log analysis in general is beyond the
scope of this document, and not really part of the job of the
webserver itself.  For more information about this topic, and for
applications which perform log analysis, check the <a
href="http://dmoz.org/Computers/Software/Internet/Site_Management/Log_analysis/";
>Open Directory</a> or <a
href="http://dir.yahoo.com/Computers_and_Internet/Software/Internet/World_Wide_Web/Servers/Log_Analysis_Tools/";
>Yahoo</a>.</p>

<p>Various versions of Apache HTTPD have used other modules and
directives to control access logging, including mod_log_referer,
mod_log_agent, and the <code>TransferLog</code> directive.  However,
the <code>CustomLog</code> directive now subsumes the functionality of
all the older directives.</p>

<p>The format of the access log is highly configurable.  The format is
specified using a <a href="mod/mod_log_config.html#format">format
string</a> that looks much like a c-style printf(1) format string.
The sections below explain some of the commonly used formats,
as examples.</p>

<h3><a name="common">Common Log Format</a></h3>

<p>A typical configuration for the access log might look
as follows.</p>

<blockquote><code>
LogFormat "%h %l %u %t \"%r\" %>s %b" common<br>
CustomLog logs/access_log common
</code></blockquote>

<p>This defines the <em>nickname</em> <code>common</code> and
associates it with a particular log format string.  Notice that the
format string consists of percent-directives, each of which tell the
server to log a particular piece of information.  In addition, literal
characters may be placed in the format string. The percent
(<code>"</code>) must be escaped by placing a back-slash before it to
prevent it from being interpreted as the end of the format string.
The <code>CustomLog</code> directive sets up a new log file using the
defined <em>nickname</em>.</p>

<p>This configuration will write log entries in a format known as the
Common Log Format (CLF).  This standard format can be produced by many
different web servers and read by many log analysis programs.  The log
file entries produced by this configuration will look something like
this:</p>

<blockquote><code>
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 
200 2326
</code></blockquote>

<p>Each part of this log entry is described below.</p>

<dl>
<dt><code>127.0.0.1</code> (<code>%h</code>)</dt> <dd>This is the IP
address of the client (remote host) which made the request to the
server.  If <a
href="mod/core.html#hostnamelookups">HostNameLookups</a> is set to
<code>On</code>, then the server will try to determine the hostname
and log it in place of the IP address.  However, this configuration is
not recommended since it can significantly slow the server.  Instead,
it is best to use a log post-processor such as <a
href="programs/logresolve.html">logresolve</a> to determine the
hostnames.  The IP address reported here is not necessarily the
address of the machine at which the user is sitting.  If a proxy
server exists between the user and the server, this address will be
the address of the proxy, rather than the originating machine.</dd>

<dt><code>-</code> (<code>%l</code>)</dt> <dd>The "hyphen" in the
output indicates that the requested piece of information is not
available.  In this case, the information that is not available is the
"identity" of the remote user as determined by <code>identd</code> on
the clients machine.  This information is highly unreliable and should
almost never be used except on tightly controlled internal networks.
Apache HTTPD will not even attempt to determine this information
unless <a href="mod/core.html#identitycheck">IdentityCheck</a> is set
to <code>On</code>.</dd>

<dt><code>frank</code> (<code>%u</code>)</dt> <dd>This is the userid
of the person requesting the document as determined by HTTP
authentication.  This is the same value that is typically provided to
CGI scripts in the <code>REMOTE_USER</code> environment variable.  If
the document is not password protected, this entry will be
"<code>-</code>" just like the previous one.</dd>

<dt><code>[10/Oct/2000:13:55:36 -0700]</code> (<code>%t</code>)</dt>
<dd>The time that the server finished processing the request.  The
format is specified in CLF as:
<BLOCKQUOTE><CODE> date = [day/month/year:hour:minute:second zone] <BR>
day = 2*digit<BR>
month = 3*letter<BR>
year = 4*digit<BR>
hour = 2*digit<BR>
minute = 2*digit<BR>
second = 2*digit<BR>
zone = (`+' | `-') 4*digit</CODE></BLOCKQUOTE>
</dd>

<dt><code>"GET /apache_pb.gif HTTP/1.0"</code>
(<code>\"%r\"</code>)</dt> <dd>The request line from the client is
given in double quotes.  The request line itself contains a great deal
of useful information.  First, the method used by the client is
<code>GET</code>.  Second, the client requested the resource
<code>/apache_pb.gif</code>, and third, the client used the protocol
<code>HTTP/1.0</code>.</dd>

<dt><code>200</code></dt> (<code>%>s</code>) <dd>This is the status
code that the server sends back to the client.  This information is
very valuable, because it reveals whether the request resulted in a
successful response (codes beginning in 2), a redirection (codes
beginning in 3), an error caused by the client (codes beginning in 4),
or an error in the server (codes beginning in 5).  Some of the common
status codes are
<dl>
<dt>200 OK</dt>
<dd>The request has succeeded.</dd>
<dt>206 Partial Content</dt>
<dd>The client requested a part of a resource and the server
responded in kind.</dd>
<dt>301 Moved Permanently</dt>
<dd>The requested resource has been permanently relocated at
a new URI.</dd>
<dt>302 Moved Temporarily</dt>
<dd>The requested resource habe been temporarily relocated at
a new URI.</dd>
<dt>304 Not Modified</dt>
<dd>The document has not been modified since the last
time it was requested by the client.</dd>
<dt>401 Unauthorized</dt>
<dd>The resource requires authentication, but the
client has not yet supplied the correct credentials.</dd>
<dt>403 Forbidden</dt>
<dd>The client is not allowed to access the requested resource.</dd>
<dt>404 Not Found</dt>
<dd>The server does not have a resource matching the requested
URI.</dt>
<dt>500 Internal Server Error</dt>
<dd>The server encountered an unexpected condition which prevented it
from fulfilling the request.</dd>
</dl>
The full list of possible status codes can be
found in the HTTP specification.</dd>

<dt><code>2326</code> (<code>%b</code>)
<dd>The last entry indicates the size of the object returned to
the client, not including the response headers.</dd>

</dl>

<h4><a name="combined">Combined Log Format</a></h4>

<p>Another commonly used format string is called the
Combined Log Format.  It can be used as follows.</p>

<blockquote><code>
logformat "%h %l %u %t \"%r\" %&gt;s %b \"%{Referer}i\" \"%{User-agent}i\"" 
combined<br>
CustomLog log/acces_log combined
</code></blockquote>

<p>This format is exactly the same as the Common Log Format,
with the addition of two more fields.  The access log under this
format will look like:</p>

<blockquote><code>
127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 
200 2326 "http://www.example.com/start.html"; "Mozilla/4.08 [en] (Win98; I ;Nav)"
</code></blockquote>

<p>The additional fields are:</p>

<dl>

<dt><code>"http://www.example.com/start.html";</code>
(<code>\"%{Referer}i\"</code>)</dt> <dd>The "Referer" (sic) HTTP
request header.  This gives the site that the client reports having
been referred from.  (This should be the page that links to or includes
<code>/apache_pb.gif</code>).

<dt><code>"Mozilla/4.08 [en] (Win98; I ;Nav)"</code>
(<code>\"%{User-agent}i\"</code>)</dt> <dd>The User-Agent HTTP request
header.  This is the identifying information that the client browser
reports about itself.</dd>

</dl>

<h3><a name="multiple">Multiple Access Logs</a></h3>

<p>Multiple access logs can be created simply by specifying multiple
<code>CustomLog</code> directives in the configuration file.  For
example, the following directives will create three access logs.  The
first contains the basic information, while the second and third
contain referer and browser information.  The last two
<code>CustomLog</code> lines show how to mimick the effects of the
<code>ReferLog</code> and <code>AgentLog</code> directives.</p>

<blockquote><code>
LogFormat "%h %l %u %t \"%r\" %>s %b" common<br>
CustomLog logs/access_log common<br>
CustomLog logs/referer_log "%{Referer}i -> %U"<br>
CustomLog logs/agent_log "%{User-agent}i"
</code></blockquote>

<p>This example also shows that it is not necessary to define a
nickname with the <code>LogFormat</code> directive.  Instead, the log
format can be specified directly in the <code>CustomLog</code>
directive.</p>

<h3><a name="conditional">Conditional Logging</a></h3>

<p>There are times when it is convenient to exclude certain entries
from the access logs based on characteristics of the client request.
This is easily accomplished with the help of <a
href="env.html">environment variables</a>.  First, an environment
variable must be set to indicate that the request meets certain
conditions.  This is usually accomplished with <a
href="mod/mod_setenvif.html#setenvif">SetEnvIf</a>.  Then the
<code>env=</code> clause of the <code>CustomLog</code> directive is
used to include or exclude requests where the environment variable is
set.  Some useful examples:</p>

<blockquote><code>
# Exclude requests from the loop-back interfact<br>
SetEnvIf Remote_Addr "127\.0\.0\.1" dontlog<br>
# Exclude requests for the robots.txt file<br>
SetEnvIf Request_URI "^/robots\.txt$" dontlog<br>
# Log what remains<br>
CustomLog logs/access_log common env=!dontlog
</code></blockquote>

<p>As another example, consider logging requests from english-speakers
to one log file, and non-english speakers to a different log file.</p>

<blockquote><code>
SetEnvIf Accept-Language "en" english<br>
CustomLog logs/english_log common env=english<br>
CustomLog logs/non_english_log common env=!english
</code></blockquote>

<p>Although we have just shown that conditional logging is very
powerful and flexibly, we do not recommend using it in general.  Log
files are more useful when they contain a complete record of server
activity.  It is usually best to simply post-process the log files to
remove requests that you do not want to consider.</p>

<h2><a name="rotation">Log Rotation</a></h2>

<p>On even a moderately busy server, the quantity of information stored
in the log files is very large.  The access logs file typically grows
1 MB or more per 10,000 requests.  It will consequently be necessary
to periodically rotate the log files by moving or deleting the
existing logs.  This cannot be done while the server is running,
because Apache will continue writing to the old log file as long
as it holds the file open.  Instead, the server must be
<a href="stopping.html">restarted</a> after the log files are
moved or deleted so that it will open new log files.</p>

<p>By using a <em>graceful</em> restart, the server can be
instructed to open new log files without losing any existing
or pending connections from clients.  However, in order to
accomplish this, the server must continue to write to
the old log files while it finishes serving old requests.
It is therefore necessary to wait for some time after the
restart before doing any processing on the log files.
A typical scenario that simply rotates the logs and
compresses the old logs to save space is:</p>

<blockquote><code>
mv access_log access_log.old<br>
mv error_log error_log.old<br>
apachectl graceful<br>
sleep 600<br>
gzip access_log.old error_log.old
</code></blockquote>

<p>Another way to perform log rotation is using <a href="#piped">piped
logs</a> as discussed in the next section.</p>

<h2><a name="piped">Piped Logs</a></h2>

<p>Apache HTTPD is capable of writing error and access log files
through a pipe to another process, rather than directly to a file.
This capability dramatically increases the flexibility of logging in
Apache, without adding code to the main server.  In order to write
logs to a pipe, simply replace the filename with the pipe character
"<code>|</code>", followed by the name of the executable which should
accept log entries on its standard input.  Apache will start the
piped-log process when the server starts, and will restart it if it
crashes while the server is running.  (This last feature is why
we can refer to this technique as "reliable piped logging".)</p>

<p>Some simple examples using piped logs:</p>

<blockquote><code>
# compressed logs<br>
CustomLog "|/usr/bin/gzip -c >> /var/log/access_log.gz" common<br>
# almost-real-time name resolution<br>
CustomLog "|/usr/local/apache/bin/logresolve >> /var/log/access_log" common
</code></blockquote>

<p>Notice that quotes are used to enclose the entire command
that will be called for the pipe.  Although these examples are
for the access log, the same technique can be used for the
error log.</p>

<p>One important use of piped logs is to allow log rotation without
having to restart the server.  A simple program called
<a href="programs/rotatelogs.html">rotatelogs</a> that
is included with the server can be used for this purpose.
For example, to rotate the logs every 24 hours, you can
use:</p>

<blockquote><code>
CustomLog "|/usr/local/apache/bin/rotatelogs /var/log/access_log 86400" common
</code></blockquote>

<p>A similar, but much more flexible log rotation program
called <a href="http://www.ford-mason.co.uk/resources/cronolog/";>cronolog</a>
is available at an external site.</p>

<p>As with conditional logging, piped logs are a very powerful tool,
but they should not be over used.  When it is possible to get the same
results by post-processing the logs off-line (after they are rotated),
it is usually wise to use that simpler technique.</p>


<h2><a name="virtualhosts">Virtual Hosts</a></h2>

<p>When running a server with many <a href="vhosts/">virtual
hosts</a>, there are several options for dealing with log files.
First, it is possible to use logs exactly as in a single-host server.
Simply by placing the logging directives outside the
<code>&lt;VirtualHost&gt;</code> sections in the main server context,
it is possible to log all requests in the same access log and error
log.  However, this technique does not allow for easy collection of
statistics on individual virtual hosts.</p>

<p>If, instead, <code>CustomLog</code> or <code>ErrorLog</code>
directives are placed inside a <code>&lt;VirtualHost&gt;</code>
section, all requests or errors for that virtual host can be logged to
a separate file.  Any virtual host which does not have logging
directives will have its requests sent to the main server logs.  This
technique is very useful for a small number of virtual hosts, but if
the number of hosts is very large, it can be complicated to manage.
In addition, it can often create problems with <a
href="vhosts/fd-limits.html">insufficient file descriptors</a>.</p>

<p>For the access log, there is a very good compromise.  By adding
information on the virtual host to the log format string,
it is possible to log all hosts to the same log, and later
split the log into individual files.  For example, consider the
following directives.</p>

<blockquote><code>
LogFormat "%v %l %u %t \"%r\" %>s %b" comonvhost<br>
CustomLog logs/access_log comonvhost
</code></blockquote>

<p>The <code>%v</code> is used to log the name of the virtual host
that is serving the request.  Then a program like <a
href="programs/other.html">split-logfile</a> can be used to
post-process the access log in order to split it into one file per
virtual host.</p>

<p>Unfortunately, no similar technique is available for the error log,
so you must choose between mixing all virtual hosts in the same error
log and using one error log per virtual host.</p>

<!--#include virtual="footer.html" -->
</BODY>
</HTML>


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

"chapter" on logs

Reply via email to