Re: How does http_find_header() work?

Roy Smith Thu, 31 Mar 2011 05:10:33 -0700

My intent was just to have a unique string that could be searched for in the 
logs.  Building it by smashing together the hostid, pid, timestamp, etc, was 
just a fast hack to get something unique.  I made one attempt to compact the 
string by running it through md5, but then I realized that the more bells and 
whistles I hung on this, the less portable it would be (i.e. not everybody 
might have the the same md5 API I was using).


For my purpose, all I need is something that's unique.  If anything, rather 
than making it human readable, I think a better way to approach this would be 
to make it more compact, by doing some kind of message digest, and perhaps even 
printing it in some encoding more compact than hex (say, base64).

I didn't really write a specification, but I think a critical part of the spec 
would be that the only guarantee about the id string is that it's unique, and 
that the specific format is subject to change without warning.  That would 
discourage people from trying to use it to embed whatever information seems 
useful at the moment.  The right way to recover additional information about 
the request is to use the id to correlate across logs.  For example, when we 
first discussed this in January, it was suggested (IIRC) that we might want to 
embed the IP address where the request came from.   While I can see how that 
might be useful, that information is already available.  If you see something 
in a downstream log that interests you and you want to know what IP it came 
from, use the unique id to find the corresponding entry in the front-end 
haproxy log, and the IP address will be there.


On Mar 31, 2011, at 4:43 AM, Bart van der Schans wrote:

> Hi,
> 
> Thx Roxy, this would be very useful to have. I'm just wondering about
> the id format. If all the "fields" correspond to something meaningful,
> like host_id, pid, timestamp, etcetera, would it make sense to have
> them in a more human readable format?
> 
> Regards,
> Bart
> 
> On Thu, Mar 31, 2011 at 4:30 AM, Roy Smith <[email protected]> wrote:
>> Willy,
>> 
>> This turned out to be surprisingly straight-forward.  Patch attached 
>> (against the 1.4.11 sources).
>> 
>> To enable generation of the X-Unique-Id headers, you add "unique-id" to a 
>> listen stanza in the config file.  This doesn't make any sense unless you're 
>> in http mode (although my code doesn't check for that, which could 
>> reasonably considered a bug).  What this does is adds a header that looks 
>> like:
>> 
>> X-Unique-Id: CB0A6819.4B7D.4D93DFDB.C69B.10
>> 
>> to each incoming request.  This gets done before the header capture 
>> processing happens, so you can use the existing "capture request header" to 
>> log the newly added headers.  There's nothing magic about the format of the 
>> Id code.  In the current version, it's just a mashup of the hostid, haproxy 
>> pid, a timestamp, and a sequence number.  The sequence numbers count up to 
>> 1000, and then the leading part is regenerated.  I'm sure there's better 
>> schemes that could be used.
>> 
>> Here's a sample config stanza:
>> 
>> listen test-nodes 0.0.0.0:19199
>>       mode http
>>       option httplog
>>       balance leastconn
>>       capture request header X-Unique-Id len 64
>>       unique-id
>>       server localhost localhost:9199 maxconn 8 weight 10 check inter 60s 
>> fastinter 60s rise 2
>> 
>> If there is already a X-Unique-Id header on the incoming request, it is left 
>> untouched.
>> 
>> A little documentation:
>> 
>> We've got (a probably very typical) web application which consists of many 
>> moving parts mashed together.  In our case, it's an haproxy front end, an 
>> nginx layer (which does SSL conversion and some static file serving), 
>> Apache/PHP for the main application logic, and a number of ancillary 
>> processes which the PHP code talks to over HTTP (possibly with more 
>> haproxies in the middle).  Plus mongodb.  Each of these moving parts 
>> generates a log file, but it's near impossible to correlate entries across 
>> the various logs.
>> 
>> To fix the problem, we're going to use haproxy to assign every incoming 
>> request a unique id.  All the various bits and pieces will log that id in 
>> their own log files, and pass it along in the HTTP requests they make to 
>> other services, which in turn will log it.  We're not yet sure how to deal 
>> with mongodb, but even if we can't get it to log our ids, we'll still have a 
>> very powerful tool for looking at overall performance through the entire 
>> application suite.
>> 
>> Thanks so much for the assistance you provided, not to mention making 
>> haproxy available in the first place.  Is there any possibility you could 
>> pick this up and integrate it into a future version of haproxy?  Right now, 
>> we're maintaining this in a private fork, but I'd prefer not to have to do 
>> that.  I suspect this may also be useful for other people.  If there's any 
>> modifications I could make which would help you, please let me know.
>> 
>> 
>> 


--
Roy Smith
[email protected]

Re: How does http_find_header() work?

Reply via email to