Re: Forensic Logging

Ben Laurie Tue, 30 Dec 2003 11:54:27 -0800

Colm MacCarthaigh wrote:

On Tue, Dec 30, 2003 at 11:49:37AM +0000, Ben Laurie wrote:
Could the forensic_id be tied in with mod_unique_id? It seems confusing
to have two different methods to generate unique id's for requests. Also
with unique_id, I can see it being useful to make CGI's aware of their
"tracking code" via the environment variable. That way a developer can
use the same id to track ingress, processing and egress.
Well, it would be possible to make it use the unique ID if present. I'm not in favour of requiring it, though, because it appears add a good deal of unnecessary overhead.
I realise that having the value of getpid() and time() to hand is useful
for forensic purposes, but a getpid():time():next_id++ will result in
duplicates accross even small clusters.

Ah, I see :-) does mod_unique_id handle that?

It's not unusual to be dealing
with many millions of requests per day in a single logfile. From a cursory
check here; accross 4 boxes, with a total of 17,000 httpd processes,
only 3,000 pids are unique. With about 80 requets/sec, that gives me a
probability of about 1/30625 of a request going to two different machines
but getting the same pid within one second. Unless I'm reading it wrong,
the bounds of next_id is more or less a function of MaxRequestsPerChild,
in my example - it's set to 20, so I can expect a mess-up once every
612,500 requests, that's a bit of a pain :/

Well, the most obvious answer is to prepend a box id, which could either be done when I generate the logs or when you collate them.

But more than that, it still seems confusing to have two different methods
of achieving the same task. If mod_unique_id is too much overhead, then
it needs to be rewritten. To my mind, both modules need to generate
reliable unique id's for request tracking purposes. Now either there's
a good way of doing that, or there's not - but having two different
methods and defining two different levels of uniqueness doesn't make
sense to me. I have mod_unique_id turned on for my servers, and don't
notice much overhead. MTA's like exim, postfix and so on have even more
complicated means of generating unique message id's, and they achieve
excellent throughput.

Though if mod_unique_id can be used if present that'll solve any
problems I'd have :)

I can easily do that in 2.0 - I can call a "give me a unique ID" hook, and if mod_unique ID is present, it can give me its. I could also do it by making sure mod_unique_id is run first if present and fishing the ID out of the environment, though that's a bit tacky.

Or at least, could a host-specific part be added to the forensic id? A lot of people collate logs (myself included ;) from clusters or whatever and this would make life much easier there.
Hmmm. You should only be looking at requests that didn't complete, and since it includes the whole header, the host is in there anyway.
The headers arnt host-specific in a cluster, since typically each
node is configured to answer for the same hostname. mod_unique_id
uses apr_gethostname and the ip address of the node to get around this
problem :)

I had the wrong end of the stick.

Actually that reminds me, these days mod_unique_id's algorithim isn't
clever enough for some systems which use L4 switching or anycast
balancing, I have an experimental patch here somewhere which can help
fix that, must submit it.

I'd advocate making the unique bit configurable, that must surely fix it in all cases?

Cheers,

Ben.

--
http://www.apache-ssl.org/ben.html       http://www.thebunker.net/

"There is no limit to what a man can do or how far he can go if he
doesn't mind who gets the credit." - Robert Woodruff

Re: Forensic Logging

Reply via email to