Hi Tim, On Thu, Jun 25, 2020 at 04:30:37PM +0200, Tim Düsterhus wrote: (...) > Willy: Please correct me if I misrepresented your arguments or left out > something important.
I think it's well summarized. There are other more painful points not mentioned here: - RFC3986's path normalization algorithm is bogus when it sees multiple slashes such as in "/static//images/12.jpg". This happens very frequently in URLs built by concatenation. The problem is that when it meets a "../" it suggests to remove only one level of slash and will end up in a different directory than the one a server that does simplistic normalization would do (or a cache which would first merge consecutive slashes to increase cache hit ratio). So: /static//images/../../css/main.css would become: /static/css/main.css according to RFC3986 /css/main.css according to most file systems or simplifications - some operating systems also support the backslash "\" as a directory delimiter. So you try to normalize your path correctly and leave "\..\admin/" and you're screwed again. - "+" and "%20" are equivalent in the query string, but given that in many simple applications these ones will only appear in a single form, such applications might not even check for the other one. So if you replace "%20" with "+" you will break some of them and if you replace "+" with "%20" you will break others. I've seen quite a number of implementations in the past perform the decoding just like haproxy used to until recently, which is: feed the whole URL string to the percent-decoder and decode the "+" as a space in the path part. And by normalizing that we'd also break some of them. - some servers support a non-standard UTF-16 encoding (the same ones as those using case-insensitive matching). For this they use "%u" followed by 4 digits. So your "A" could be encoded "%61", "%41", "%u0061", "%u0041", "%U0041" or "%U0061" there and will also match. But this is not standard and must not be decoded as such, at the risk of breaking yet other applications which do not expect that "%u" is transcoded. And it's even possible that in some of these servers' configurations there are rules matching "%UFEDC" but not "%FE%DC". - and I wouldn't even be surprised if some servers using some internal normalization functions would also resolve unicode homoglyphs to valid characters! Just check if your server accepts "/%ef%bd%81dmin/", that would be fun! - actually, even browsers DO NOT normalize URLs, in order to preserve them as much as possible and not to fail on broken servers! This should be heard as a strong warning! Try it by yourself, just direct your browser to /a%%b%31c%xyz/?brightness=10% and see it send: GET /a%%b%31c%xyz/?brightness=10% HTTP/1.1 you'll note that even "%31" wasn't turned into a "1". - there are other aspects (some mentioned in RFC3986). The authority par of the URI can have various forms. The best known one is the split of the net and the host in the IPv4 address representation, by which "127.0.0.1" is also "127.0.1", "127.1", "2130706433" or even "0x7f000001" (with X or x and F or f). You can even add leading zeroes. And you can use octal encoding: 017700000001. You can try to ping all of them, they'll likely work on your OS. At least my browser rewrites them in the URL bar before sending the request. This might be normalized... or not, for the same reasons of not breaking the next step in the chain, which possibly expect to have a different behavior when dealing with "16bit.16bit" representation, since host names made of digits only are permitted if there's a domain behind :-/ The port number can accept leading zeroes, so ":80" and ":0000080" are aliases. - last, those running haproxy in front of a WAF certainly don't want haproxy to wipe these precious information before the WAF has a chance to raise its awareness on this request! The problem with normalization is that it would work if everyone was doing it, but RFC3986 was specified long after 99% of the internet was already deployed and used, so at best it can be used as a guideline and a reference of traps to avoid. And things are getting worse with IoT. You can just try to run an HTTP server on an ESP8266 and you'll see that most of the time, percent-decoding is not the web server's problem at all and it will pass it unmodified. So you can definitely expect that your light bulb's web server will take requests like "GET /dim?brightness=10%" and expect that to work out of the box. Just install a normalizing load balancer in front of a gateway managing tens of thousands of heating controllers based on such devices and suddenly nobody can adjust the temperature in their homes anymore (or worse, the '%' gets dropped and becomes degrees C so when you ask for 50% heating you get 50 degrees C). The real problem is that initial implementations of HTTP never said that it was illegal to send non-canonical characters and that they ought to be rejected. It's just that HTTP was designed at an era where a server would run fine in 30kB of RAM. Nowadays with 1000 times more, many wouldn't just load. So it was less easy to insist that certain things were strictly forbidden by then. In our discussion you invoked the principle of least surprise, which dictates that haproxy ought to do "the right thing" by default. And I totally agree with it. It just turns our that with so many different behaviors around, when you're in the middle of the chain you have to be discrete and not start to rearrange the stuff that's not your business and claim it's better once tidied up, otherwise you're certain to cause bad surprises. I'm pretty sure there are far less users of a "deny" rule applied to a path than there are users of applications that would simply break by normalizing. Mind you that I've found "/" in header names and even percent-encoding, so you can easily imagine the inventivity you can have in a path that gets rewritten along a long chain using regex... Thus for me the principle of least surprise is *NOT* to normalize in the middle of the chain. > Concluding: > - The documentation should be updated to warn administrators that > http-request deny must be used very carefully when combining it with a path. That was my initial point and I was even a bit disappointed not to find such a mention of percent-decoding in the doc as it used to be well-known at the time "reqrep" and friends were the norm. We even still have examples of this in examples/acl-content-sw.cfg with the forbidden URIs. But that's how projects evolve, people change and assumptions as well, and certain design decisions need to be clearly documented. > - HAProxy should gain the ability to correctly normalize URLs (i.e. not > using the url_dec version). How exactly that would look is not yet clear. > - It could be a `http-request normalize-path percent,dotdot,Y` action. > - It could be a `normalize(percent)` converter > - The `path` fetch could be extended to gain an argument, specifying > the normalization requested. > - An option within the `defaults` section could enable normalization > for everything. It cannot be an option in a defaults section because it means you'd want to disable it for some frontends, and this becomes incompatible with the need for filtering before and after. The main problem is to be able to do that : http-request accept/redirect/set-var/etc ... if blah ... http-request normalize-uri [ possible options ] http-request other-rules ... if blah In short, apply security processing before normalization, and simple decisions after. Those who don't want these security processing but still want to apply deny rules could just start with normalize-uri. We could also imagine having a global option to state that everything that enters the H1/H2 muxes gets normalized before processing, but I strongly doubt anyone would use this because it would definitely break applications and would still not allow them to write safe rules. Converters can be useful to only check without modifying. Another easier option is to have a sample-fetch or converter that works the other way around and tells you whether there is something non-canonical in your request. It wouldn't trip on "10%" or "%%" or such things, it would catch valid encodings of characters that should not be encoded in the first place. Because these are the dangerous and suspicious ones. And this allows to also spot the "%u0061". It could also check for "../". By doing so you would be able to write : http-request deny if { req.url,is_suspicious } http-request deny if { req.path /admin/ } I tend to prefer this one because it doesn't modify the request and will not break applications. And that's more or less the way I've used to proceed in certain environments with regex matching stuff like "%(2[D-F]|3[0-9]|[46][0-0A-F]|[57][0-9A])" which is already very efficient at blocking most of these patterns. A normalize action could however also correctly recode non-printable characters that some browsers follow in links. But various options are usable simultaneously, such as : http-request normalize-uri if { req.url,is_suspicious } > If you have anything to add after reading this mail: Please do! Sure, I always have something to add :-) As I mentioned in our exchanges, due to all the variations above, URL-based access denial *never works*. YOU CANNOT IMPLEMENT SECURITY BY DENYING ONLY A SUBSET OF IDENTIFIABLE PATTERNS AMONG A LARGE AND UNCLEAR LIST OF ALIASES. Some application servers might want to see /images/../admin as something valid under a symlink while others will instead resolve it to /admin. Some will also consider that /admin/login.php.html is the same as /admin/login.php, and that this is also the same as /admin/.%5Clogin.php. You could also face the case of internal rerouting of URLs like "/admin/debug?path=/login.php" where it's the application server itself which routes its request inside the application. So even with normalization, you'd be left with a huge doubt and your application would remain totally insecure. This is why some responders said that such filtering ought to be made on the server itself. And you rightfully pointed that .htaccess is placed in the proper directory for a reason. It's also why we've implemented certificate passing in headers. It might not be the easiest thing to deploy as you said, but having the application protect itself is the only way to make it secure by DESIGN and not by applying pads on top of identified wounds. However, putting a protection layer in front of the application *IS* a good way to protect it against zero-day attacks. Blocking "../", "%00", double extensions, invalid arguments and so on will definitely save the application from being exposed to a lot of dangerous dirt, starting with vulnerability scanners which eat your CPU and fill your logs. That's why I still have a preference for implementing a nice and configurable converter to detect non-canonical forms and probably another one which would provide a good level of security in a single rule. Cheers, Willy