On Tue, Dec 30, 2003 at 11:49:37AM +0000, Ben Laurie wrote: > >Could the forensic_id be tied in with mod_unique_id? It seems confusing > >to have two different methods to generate unique id's for requests. Also > >with unique_id, I can see it being useful to make CGI's aware of their > >"tracking code" via the environment variable. That way a developer can > >use the same id to track ingress, processing and egress. > > Well, it would be possible to make it use the unique ID if present. I'm > not in favour of requiring it, though, because it appears add a good > deal of unnecessary overhead.
I realise that having the value of getpid() and time() to hand is useful for forensic purposes, but a getpid():time():next_id++ will result in duplicates accross even small clusters. It's not unusual to be dealing with many millions of requests per day in a single logfile. From a cursory check here; accross 4 boxes, with a total of 17,000 httpd processes, only 3,000 pids are unique. With about 80 requets/sec, that gives me a probability of about 1/30625 of a request going to two different machines but getting the same pid within one second. Unless I'm reading it wrong, the bounds of next_id is more or less a function of MaxRequestsPerChild, in my example - it's set to 20, so I can expect a mess-up once every 612,500 requests, that's a bit of a pain :/ But more than that, it still seems confusing to have two different methods of achieving the same task. If mod_unique_id is too much overhead, then it needs to be rewritten. To my mind, both modules need to generate reliable unique id's for request tracking purposes. Now either there's a good way of doing that, or there's not - but having two different methods and defining two different levels of uniqueness doesn't make sense to me. I have mod_unique_id turned on for my servers, and don't notice much overhead. MTA's like exim, postfix and so on have even more complicated means of generating unique message id's, and they achieve excellent throughput. Though if mod_unique_id can be used if present that'll solve any problems I'd have :) > >Or at least, could a host-specific part be added to the forensic id? > >A lot of people collate logs (myself included ;) from clusters or whatever > >and this would make life much easier there. > > Hmmm. You should only be looking at requests that didn't complete, and > since it includes the whole header, the host is in there anyway. The headers arnt host-specific in a cluster, since typically each node is configured to answer for the same hostname. mod_unique_id uses apr_gethostname and the ip address of the node to get around this problem :) Actually that reminds me, these days mod_unique_id's algorithim isn't clever enough for some systems which use L4 switching or anycast balancing, I have an experimental patch here somewhere which can help fix that, must submit it. -- Colm MacC�rthaigh Public Key: [EMAIL PROTECTED]
