On Tue, Dec 30, 2003 at 11:49:37AM +0000, Ben Laurie wrote:
Could the forensic_id be tied in with mod_unique_id? It seems confusing to have two different methods to generate unique id's for requests. Also with unique_id, I can see it being useful to make CGI's aware of their "tracking code" via the environment variable. That way a developer can use the same id to track ingress, processing and egress.
Well, it would be possible to make it use the unique ID if present. I'm not in favour of requiring it, though, because it appears add a good deal of unnecessary overhead.
I realise that having the value of getpid() and time() to hand is useful for forensic purposes, but a getpid():time():next_id++ will result in duplicates accross even small clusters.
Ah, I see :-) does mod_unique_id handle that?
It's not unusual to be dealing with many millions of requests per day in a single logfile. From a cursory check here; accross 4 boxes, with a total of 17,000 httpd processes, only 3,000 pids are unique. With about 80 requets/sec, that gives me a probability of about 1/30625 of a request going to two different machines but getting the same pid within one second. Unless I'm reading it wrong, the bounds of next_id is more or less a function of MaxRequestsPerChild, in my example - it's set to 20, so I can expect a mess-up once every 612,500 requests, that's a bit of a pain :/
Well, the most obvious answer is to prepend a box id, which could either be done when I generate the logs or when you collate them.
But more than that, it still seems confusing to have two different methods of achieving the same task. If mod_unique_id is too much overhead, then it needs to be rewritten. To my mind, both modules need to generate reliable unique id's for request tracking purposes. Now either there's a good way of doing that, or there's not - but having two different methods and defining two different levels of uniqueness doesn't make sense to me. I have mod_unique_id turned on for my servers, and don't notice much overhead. MTA's like exim, postfix and so on have even more complicated means of generating unique message id's, and they achieve excellent throughput.
Though if mod_unique_id can be used if present that'll solve any problems I'd have :)
I can easily do that in 2.0 - I can call a "give me a unique ID" hook, and if mod_unique ID is present, it can give me its. I could also do it by making sure mod_unique_id is run first if present and fishing the ID out of the environment, though that's a bit tacky.
Or at least, could a host-specific part be added to the forensic id? A lot of people collate logs (myself included ;) from clusters or whatever and this would make life much easier there.
Hmmm. You should only be looking at requests that didn't complete, and since it includes the whole header, the host is in there anyway.
The headers arnt host-specific in a cluster, since typically each node is configured to answer for the same hostname. mod_unique_id uses apr_gethostname and the ip address of the node to get around this problem :)
I had the wrong end of the stick.
Actually that reminds me, these days mod_unique_id's algorithim isn't clever enough for some systems which use L4 switching or anycast balancing, I have an experimental patch here somewhere which can help fix that, must submit it.
I'd advocate making the unique bit configurable, that must surely fix it in all cases?
Cheers,
Ben.
-- http://www.apache-ssl.org/ben.html http://www.thebunker.net/
"There is no limit to what a man can do or how far he can go if he doesn't mind who gets the credit." - Robert Woodruff
