Hi Beni, A few things to digest here.
What was leading me up this path was a bit of elementary (and probably naïve) white-listing with respect to the contents of the Host header and the URI/L supplied by the user. Tools like Fiddler make request manipulation trivial so filtering out 'obvious' manipulation attempts would be a good idea. With this in mind my thinking (if it can be considered as such) was that: (1) user request is for http://www.example.com/whatever (2) Host header is www.example.com (3) All is good! Pass request on to server. Alternatively: (1) user request is for http://www.example.com/whatever (2) Host header is www.whatever.com (3) All is NOT good! Flick request somewhere harmless. I'm not sure whether your solution supports this, and if your interpretation is correct maybe HAProxy doesn't support it either. I'll do some more experimenting and I hope I don't lock myself out ;-) Cheers Andrew -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Benedikt Fraunhofer Sent: Wednesday, 28 April 2010 7:42 PM To: Andrew Commons Cc: [email protected] Subject: Re: Matching URLs at layer 7 Hi Andrew, 2010/4/28 Andrew Commons <[email protected]>: > url_beg <string> > Returns true when the URL begins with one of the strings. This can be used to > check whether a URL begins with a slash or with a protocol scheme. > > So I'm assuming that "protocol scheme" means http:// or ftp:// or whatever.... I would assume that, too.. but :) reading the other matching options it looks like those only affect the "anchoring" of the matching. Like > url_ip <ip_address> > Applies to the IP address specified in the absolute URI in an HTTP request. > It can be used to prevent access to certain resources such as local network. > It is useful with option "http_proxy". yep. but watch this "http_proxy" > url_port <integer> > "http_proxy". Note that if the port is not specified in the request, port 80 > is assumed. same here.. This enables plain proxy mode where requests are issued (from the client) like GET http://www.example.com/importantFile.txt HTTP/1.0 . > This seems to be reinforced (I think!) by: > > url_dom <string> > Returns true when one of the strings is found isolated or delimited with dots > in the URL. This is used to perform domain name matching without the risk of > wrong match due to colliding prefixes. See also "url_sub". I personally don't think so.. I guess this is just another version of "anchoring", here "\.$STRING\." > If I'm suffering from a bit of 'brain fade' here just set me on the right > road :-) If the url_ criteria have different interpretations in terms of what > the 'url' is then let's find out what these are! I currently can't give it a try as i finally managed to lock myself out, but http://haproxy.1wt.eu/download/1.4/doc/configuration.txt has an example that looks exactly as what you need: ------------------- To select a different backend for requests to static contents on the "www" site and to every request on the "img", "video", "download" and "ftp" hosts : acl url_static path_beg /static /images /img /css acl url_static path_end .gif .png .jpg .css .js acl host_www hdr_beg(host) -i www acl host_static hdr_beg(host) -i img. video. download. ftp. # now use backend "static" for all static-only hosts, and for static urls # of host "www". Use backend "www" for the rest. use_backend static if host_static or host_www url_static use_backend www if host_www ------------------- and as "begin" really means anchoring it with "^" in a regex this would mean that there's no host in url as this would redefine the meaning of "begin" which should not be done :) So you should be fine with acl xxx_host hdr(Host) -i xxx.example.com acl xxx_url url_beg / #there's already a predefined acl doing this. use_backend xxx if xxx_host xxx_url if i recall your example correctly.. But you should really put something behind the url_beg to be of any use :) Just my 2 cent Beni.

