Hi Dmitry,
On Wed, Oct 24, 2012 at 07:19:18PM +0400, Dmitry Sivachenko wrote:
> Well, at least from Wikipedia:
> http://en.wikipedia.org/wiki/Percent-encoding#Percent-encoding_the_percent_character
Well, first, why not using the valid source instead of a second-hand
pseudo-standard invented by some random junkie who wanted to contribute
to Wikipedia ? URIs are defined by RFC3986, it is the only source you
should use when you want to validate some compliance.
> Because the percent ("%") character serves as the indicator for
> percent-encoded octets, it must be percent-encoded as "%25" for that octet
> to be used as data within a URI.
exactly.
> When haproxy encounters, say, unencoded whitespace character, it returns
> HTTP 400. Why '%' should be an exception?
It's not an exception. The space is the exception in that it is part of
HTTP (RFC2616), not the URI (RFC3986). It is the delimiter which says :
- what is the method
- what is the URI
- what is the HTTP version
Haproxy does validate the HTTP format, but not the URI. One of the reasons
is that it requires adding more states to the parser for something it does
not need to do its job, but the most important reason in fact is that some
clients and servers do not comply with the spec and even a little bit of
filtering causes a big amount of breakage here. WAF authors are well aware
of this.
For instance, the percent-encoding is clearly defined like this :
pct-encoded = "%" HEXDIG HEXDIG
Could you tell me why MSIE likes to send "%u1234" to IIS servers then ?
So since haproxy has no business in the game of decoding URIs, it simply
ignores them. However if you want to add a bit of control there, you can
easily do it with some regex :
acl bad-pct uri_reg -i %[^0-9a-F] %[0-9a-F][^0-9a-F]
http-request deny if bad-pct
But I still think it's not the best place to do this and maybe you need a
WAF instead (which could happily be load balanced by haproxy since it will
not mangle the requests).
Regards,
Willy