Re: Forensic Logging

Colm MacCarthaigh Tue, 30 Dec 2003 08:03:34 -0800

On Tue, Dec 30, 2003 at 11:49:37AM +0000, Ben Laurie wrote:
> >Could the forensic_id be tied in with mod_unique_id? It seems confusing
> >to have two different methods to generate unique id's for requests. Also
> >with unique_id, I can see it being useful to make CGI's aware of their
> >"tracking code" via the environment variable. That way a developer can
> >use the same id to track ingress, processing and egress.
> 
> Well, it would be possible to make it use the unique ID if present. I'm 
> not in favour of requiring it, though, because it appears add a good 
> deal of unnecessary overhead.


I realise that having the value of getpid() and time() to hand is useful
for forensic purposes, but a getpid():time():next_id++ will result in
duplicates accross even small clusters. It's not unusual to be dealing
with many millions of requests per day in a single logfile. From a cursory
check here; accross 4 boxes, with a total of 17,000 httpd processes,
only 3,000 pids are unique. With about 80 requets/sec, that gives me a
probability of about 1/30625 of a request going to two different machines
but getting the same pid within one second. Unless I'm reading it wrong,
the bounds of next_id is more or less a function of MaxRequestsPerChild,
in my example - it's set to 20, so I can expect a mess-up once every
612,500 requests, that's a bit of a pain :/

But more than that, it still seems confusing to have two different methods
of achieving the same task. If mod_unique_id is too much overhead, then
it needs to be rewritten. To my mind, both modules need to generate
reliable unique id's for request tracking purposes. Now either there's
a good way of doing that, or there's not - but having two different
methods and defining two different levels of uniqueness doesn't make
sense to me. I have mod_unique_id turned on for my servers, and don't
notice much overhead. MTA's like exim, postfix and so on have even more
complicated means of generating unique message id's, and they achieve
excellent throughput.

Though if mod_unique_id can be used if present that'll solve any
problems I'd have :)

> >Or at least, could a host-specific part be added to the forensic id? 
> >A lot of people collate logs (myself included ;) from clusters or whatever 
> >and this would make life much easier there.
> 
> Hmmm. You should only be looking at requests that didn't complete, and 
> since it includes the whole header, the host is in there anyway.

The headers arnt host-specific in a cluster, since typically each
node is configured to answer for the same hostname. mod_unique_id
uses apr_gethostname and the ip address of the node to get around this
problem :)

Actually that reminds me, these days mod_unique_id's algorithim isn't
clever enough for some systems which use L4 switching or anycast
balancing, I have an experimental patch here somewhere which can help
fix that, must submit it.

-- 
Colm MacC�rthaigh                        Public Key: [EMAIL PROTECTED]

Re: Forensic Logging

Reply via email to