пт, 26 июн. 2020 г. в 11:00, Willy Tarreau <w...@1wt.eu>:

> Hi Tim,
>
> On Thu, Jun 25, 2020 at 04:30:37PM +0200, Tim Düsterhus wrote:
> (...)
> > Willy: Please correct me if I misrepresented your arguments or left out
> > something important.
>
> I think it's well summarized. There are other more painful points not
> mentioned here:
>

Tim, can we schedule this for 2.3 ? It seems to be "too much" for 2.2

as for normalization, I'd like an idea to compare nginx normalization rules.

(I recall myself that only "merge_slashes off;" was rarely an issue, the
rest of normalization rules seem to be just fine)


>
>   - RFC3986's path normalization algorithm is bogus when it sees
>     multiple slashes such as in "/static//images/12.jpg". This happens
>     very frequently in URLs built by concatenation. The problem is that
>     when it meets a "../" it suggests to remove only one level of slash
>     and will end up in a different directory than the one a server that
>     does simplistic normalization would do (or a cache which would first
>     merge consecutive slashes to increase cache hit ratio). So:
>         /static//images/../../css/main.css would become:
>         /static/css/main.css according to RFC3986
>         /css/main.css according to most file systems or simplifications
>
>   - some operating systems also support the backslash "\" as a directory
>     delimiter. So you try to normalize your path correctly and leave
>     "\..\admin/" and you're screwed again.
>
>   - "+" and "%20" are equivalent in the query string, but given that in
>     many simple applications these ones will only appear in a single form,
>     such applications might not even check for the other one. So if you
>     replace "%20" with "+" you will break some of them and if you replace
>     "+" with "%20" you will break others. I've seen quite a number of
>     implementations in the past perform the decoding just like haproxy
>     used to until recently, which is: feed the whole URL string to the
>     percent-decoder and decode the "+" as a space in the path part. And
>     by normalizing that we'd also break some of them.
>
>   - some servers support a non-standard UTF-16 encoding (the same ones as
>     those using case-insensitive matching). For this they use "%u" followed
>     by 4 digits. So your "A" could be encoded "%61", "%41", "%u0061",
>     "%u0041", "%U0041" or "%U0061" there and will also match. But this is
>     not standard and must not be decoded as such, at the risk of breaking
>     yet other applications which do not expect that "%u" is transcoded. And
>     it's even possible that in some of these servers' configurations there
>     are rules matching "%UFEDC" but not "%FE%DC".
>
>   - and I wouldn't even be surprised if some servers using some internal
>     normalization functions would also resolve unicode homoglyphs to valid
>     characters! Just check if your server accepts "/%ef%bd%81dmin/", that
>     would be fun!
>
>   - actually, even browsers DO NOT normalize URLs, in order to preserve
>     them as much as possible and not to fail on broken servers! This should
>     be heard as a strong warning! Try it by yourself, just direct your
>     browser to /a%%b%31c%xyz/?brightness=10% and see it send:
>
>        GET /a%%b%31c%xyz/?brightness=10% HTTP/1.1
>
>     you'll note that even "%31" wasn't turned into a "1".
>
>   - there are other aspects (some mentioned in RFC3986). The authority
>     par of the URI can have various forms. The best known one is the
>     split of the net and the host in the IPv4 address representation, by
>     which "127.0.0.1" is also "127.0.1", "127.1", "2130706433" or even
>     "0x7f000001" (with X or x and F or f). You can even add leading
>     zeroes. And you can use octal encoding: 017700000001. You can try to
>     ping all of them, they'll likely work on your OS. At least my browser
>     rewrites them in the URL bar before sending the request. This might be
>     normalized... or not, for the same reasons of not breaking the next
>     step in the chain, which possibly expect to have a different behavior
>     when dealing with "16bit.16bit" representation, since host names made
>     of digits only are permitted if there's a domain behind :-/  The port
>     number can accept leading zeroes, so ":80" and ":0000080" are aliases.
>
>   - last, those running haproxy in front of a WAF certainly don't want
>     haproxy to wipe these precious information before the WAF has a chance
>     to raise its awareness on this request!
>
> The problem with normalization is that it would work if everyone was doing
> it, but RFC3986 was specified long after 99% of the internet was already
> deployed and used, so at best it can be used as a guideline and a reference
> of traps to avoid. And things are getting worse with IoT. You can just try
> to run an HTTP server on an ESP8266 and you'll see that most of the time,
> percent-decoding is not the web server's problem at all and it will pass
> it unmodified. So you can definitely expect that your light bulb's web
> server will take requests like "GET /dim?brightness=10%" and expect that
> to work out of the box. Just install a normalizing load balancer in front
> of a gateway managing tens of thousands of heating controllers based on
> such devices and suddenly nobody can adjust the temperature in their homes
> anymore (or worse, the '%' gets dropped and becomes degrees C so when you
> ask for 50% heating you get 50 degrees C).
>
> The real problem is that initial implementations of HTTP never said that
> it was illegal to send non-canonical characters and that they ought to be
> rejected. It's just that HTTP was designed at an era where a server would
> run fine in 30kB of RAM. Nowadays with 1000 times more, many wouldn't just
> load. So it was less easy to insist that certain things were strictly
> forbidden by then.
>
> In our discussion you invoked the principle of least surprise, which
> dictates that haproxy ought to do "the right thing" by default. And I
> totally agree with it. It just turns our that with so many different
> behaviors around, when you're in the middle of the chain you have to
> be discrete and not start to rearrange the stuff that's not your
> business and claim it's better once tidied up, otherwise you're certain
> to cause bad surprises. I'm pretty sure there are far less users of
> a "deny" rule applied to a path than there are users of applications
> that would simply break by normalizing. Mind you that I've found "/"
> in header names and even percent-encoding, so you can easily imagine
> the inventivity you can have in a path that gets rewritten along a
> long chain using regex... Thus for me the principle of least surprise
> is *NOT* to normalize in the middle of the chain.
>
> > Concluding:
> > - The documentation should be updated to warn administrators that
> > http-request deny must be used very carefully when combining it with a
> path.
>
> That was my initial point and I was even a bit disappointed not to find
> such a mention of percent-decoding in the doc as it used to be well-known
> at the time "reqrep" and friends were the norm. We even still have examples
> of this in examples/acl-content-sw.cfg with the forbidden URIs. But that's
> how projects evolve, people change and assumptions as well, and certain
> design decisions need to be clearly documented.
>
> > - HAProxy should gain the ability to correctly normalize URLs (i.e. not
> > using the url_dec version). How exactly that would look is not yet clear.
> >   - It could be a `http-request normalize-path percent,dotdot,Y` action.
> >   - It could be a `normalize(percent)` converter
> >   - The `path` fetch could be extended to gain an argument, specifying
> > the normalization requested.
> >   - An option within the `defaults` section could enable normalization
> > for everything.
>
> It cannot be an option in a defaults section because it means you'd want
> to disable it for some frontends, and this becomes incompatible with the
> need for filtering before and after. The main problem is to be able to
> do that :
>
>      http-request accept/redirect/set-var/etc ... if blah
>      ...
>      http-request normalize-uri [ possible options ]
>      http-request other-rules ... if blah
>
> In short, apply security processing before normalization, and simple
> decisions after. Those who don't want these security processing but
> still want to apply deny rules could just start with normalize-uri.
> We could also imagine having a global option to state that everything
> that enters the H1/H2 muxes gets normalized before processing, but I
> strongly doubt anyone would use this because it would definitely break
> applications and would still not allow them to write safe rules.
>
> Converters can be useful to only check without modifying.
>
> Another easier option is to have a sample-fetch or converter that
> works the other way around and tells you whether there is something
> non-canonical in your request. It wouldn't trip on "10%" or "%%" or
> such things, it would catch valid encodings of characters that should
> not be encoded in the first place. Because these are the dangerous
> and suspicious ones. And this allows to also spot the "%u0061". It
> could also check for "../". By doing so you would be able to write :
>
>     http-request deny if { req.url,is_suspicious }
>     http-request deny if { req.path /admin/ }
>
> I tend to prefer this one because it doesn't modify the request and
> will not break applications. And that's more or less the way I've
> used to proceed in certain environments with regex matching stuff
> like "%(2[D-F]|3[0-9]|[46][0-0A-F]|[57][0-9A])" which is already
> very efficient at blocking most of these patterns.
>
> A normalize action could however also correctly recode non-printable
> characters that some browsers follow in links. But various options
> are usable simultaneously, such as :
>
>     http-request normalize-uri if { req.url,is_suspicious }
>
> > If you have anything to add after reading this mail: Please do!
>
> Sure, I always have something to add :-)
>
> As I mentioned in our exchanges, due to all the variations above, URL-based
> access denial *never works*. YOU CANNOT IMPLEMENT SECURITY BY DENYING ONLY
> A SUBSET OF IDENTIFIABLE PATTERNS AMONG A LARGE AND UNCLEAR LIST OF
> ALIASES.
>
> Some application servers might want to see /images/../admin as something
> valid under a symlink while others will instead resolve it to /admin. Some
> will also consider that /admin/login.php.html is the same as
> /admin/login.php,
> and that this is also the same as /admin/.%5Clogin.php. You could also face
> the case of internal rerouting of URLs like "/admin/debug?path=/login.php"
> where it's the application server itself which routes its request inside
> the application.
>
> So even with normalization, you'd be left with a huge doubt and your
> application would remain totally insecure. This is why some responders
> said that such filtering ought to be made on the server itself. And
> you rightfully pointed that .htaccess is placed in the proper directory
> for a reason. It's also why we've implemented certificate passing in
> headers. It might not be the easiest thing to deploy as you said, but
> having the application protect itself is the only way to make it secure
> by DESIGN and not by applying pads on top of identified wounds.
>
> However, putting a protection layer in front of the application *IS* a
> good way to protect it against zero-day attacks. Blocking "../", "%00",
> double extensions, invalid arguments and so on will definitely save the
> application from being exposed to a lot of dangerous dirt, starting with
> vulnerability scanners which eat your CPU and fill your logs.
>
> That's why I still have a preference for implementing a nice and
> configurable converter to detect non-canonical forms and probably
> another one which would provide a good level of security in a single
> rule.
>
> Cheers,
> Willy
>

Reply via email to