Hi Tim,

On Thu, Jun 25, 2020 at 04:30:37PM +0200, Tim Düsterhus wrote:
> Willy: Please correct me if I misrepresented your arguments or left out
> something important.

I think it's well summarized. There are other more painful points not
mentioned here:

  - RFC3986's path normalization algorithm is bogus when it sees
    multiple slashes such as in "/static//images/12.jpg". This happens
    very frequently in URLs built by concatenation. The problem is that
    when it meets a "../" it suggests to remove only one level of slash
    and will end up in a different directory than the one a server that
    does simplistic normalization would do (or a cache which would first
    merge consecutive slashes to increase cache hit ratio). So:
        /static//images/../../css/main.css would become:
        /static/css/main.css according to RFC3986
        /css/main.css according to most file systems or simplifications

  - some operating systems also support the backslash "\" as a directory
    delimiter. So you try to normalize your path correctly and leave
    "\..\admin/" and you're screwed again.

  - "+" and "%20" are equivalent in the query string, but given that in
    many simple applications these ones will only appear in a single form,
    such applications might not even check for the other one. So if you
    replace "%20" with "+" you will break some of them and if you replace
    "+" with "%20" you will break others. I've seen quite a number of
    implementations in the past perform the decoding just like haproxy
    used to until recently, which is: feed the whole URL string to the
    percent-decoder and decode the "+" as a space in the path part. And
    by normalizing that we'd also break some of them.

  - some servers support a non-standard UTF-16 encoding (the same ones as
    those using case-insensitive matching). For this they use "%u" followed
    by 4 digits. So your "A" could be encoded "%61", "%41", "%u0061",
    "%u0041", "%U0041" or "%U0061" there and will also match. But this is
    not standard and must not be decoded as such, at the risk of breaking
    yet other applications which do not expect that "%u" is transcoded. And
    it's even possible that in some of these servers' configurations there
    are rules matching "%UFEDC" but not "%FE%DC".

  - and I wouldn't even be surprised if some servers using some internal
    normalization functions would also resolve unicode homoglyphs to valid
    characters! Just check if your server accepts "/%ef%bd%81dmin/", that
    would be fun!

  - actually, even browsers DO NOT normalize URLs, in order to preserve
    them as much as possible and not to fail on broken servers! This should
    be heard as a strong warning! Try it by yourself, just direct your
    browser to /a%%b%31c%xyz/?brightness=10% and see it send:

       GET /a%%b%31c%xyz/?brightness=10% HTTP/1.1

    you'll note that even "%31" wasn't turned into a "1".

  - there are other aspects (some mentioned in RFC3986). The authority
    par of the URI can have various forms. The best known one is the
    split of the net and the host in the IPv4 address representation, by
    which "" is also "127.0.1", "127.1", "2130706433" or even
    "0x7f000001" (with X or x and F or f). You can even add leading
    zeroes. And you can use octal encoding: 017700000001. You can try to
    ping all of them, they'll likely work on your OS. At least my browser
    rewrites them in the URL bar before sending the request. This might be
    normalized... or not, for the same reasons of not breaking the next
    step in the chain, which possibly expect to have a different behavior
    when dealing with "16bit.16bit" representation, since host names made
    of digits only are permitted if there's a domain behind :-/  The port
    number can accept leading zeroes, so ":80" and ":0000080" are aliases.

  - last, those running haproxy in front of a WAF certainly don't want
    haproxy to wipe these precious information before the WAF has a chance
    to raise its awareness on this request!

The problem with normalization is that it would work if everyone was doing
it, but RFC3986 was specified long after 99% of the internet was already
deployed and used, so at best it can be used as a guideline and a reference
of traps to avoid. And things are getting worse with IoT. You can just try
to run an HTTP server on an ESP8266 and you'll see that most of the time,
percent-decoding is not the web server's problem at all and it will pass
it unmodified. So you can definitely expect that your light bulb's web
server will take requests like "GET /dim?brightness=10%" and expect that
to work out of the box. Just install a normalizing load balancer in front
of a gateway managing tens of thousands of heating controllers based on
such devices and suddenly nobody can adjust the temperature in their homes
anymore (or worse, the '%' gets dropped and becomes degrees C so when you
ask for 50% heating you get 50 degrees C).

The real problem is that initial implementations of HTTP never said that
it was illegal to send non-canonical characters and that they ought to be
rejected. It's just that HTTP was designed at an era where a server would
run fine in 30kB of RAM. Nowadays with 1000 times more, many wouldn't just
load. So it was less easy to insist that certain things were strictly
forbidden by then.

In our discussion you invoked the principle of least surprise, which
dictates that haproxy ought to do "the right thing" by default. And I
totally agree with it. It just turns our that with so many different
behaviors around, when you're in the middle of the chain you have to
be discrete and not start to rearrange the stuff that's not your
business and claim it's better once tidied up, otherwise you're certain
to cause bad surprises. I'm pretty sure there are far less users of
a "deny" rule applied to a path than there are users of applications
that would simply break by normalizing. Mind you that I've found "/"
in header names and even percent-encoding, so you can easily imagine
the inventivity you can have in a path that gets rewritten along a
long chain using regex... Thus for me the principle of least surprise
is *NOT* to normalize in the middle of the chain.

> Concluding:
> - The documentation should be updated to warn administrators that
> http-request deny must be used very carefully when combining it with a path.

That was my initial point and I was even a bit disappointed not to find
such a mention of percent-decoding in the doc as it used to be well-known
at the time "reqrep" and friends were the norm. We even still have examples
of this in examples/acl-content-sw.cfg with the forbidden URIs. But that's
how projects evolve, people change and assumptions as well, and certain
design decisions need to be clearly documented.

> - HAProxy should gain the ability to correctly normalize URLs (i.e. not
> using the url_dec version). How exactly that would look is not yet clear.
>   - It could be a `http-request normalize-path percent,dotdot,Y` action.
>   - It could be a `normalize(percent)` converter
>   - The `path` fetch could be extended to gain an argument, specifying
> the normalization requested.
>   - An option within the `defaults` section could enable normalization
> for everything.

It cannot be an option in a defaults section because it means you'd want
to disable it for some frontends, and this becomes incompatible with the
need for filtering before and after. The main problem is to be able to
do that :

     http-request accept/redirect/set-var/etc ... if blah
     http-request normalize-uri [ possible options ]
     http-request other-rules ... if blah

In short, apply security processing before normalization, and simple
decisions after. Those who don't want these security processing but
still want to apply deny rules could just start with normalize-uri.
We could also imagine having a global option to state that everything
that enters the H1/H2 muxes gets normalized before processing, but I
strongly doubt anyone would use this because it would definitely break
applications and would still not allow them to write safe rules.

Converters can be useful to only check without modifying.

Another easier option is to have a sample-fetch or converter that
works the other way around and tells you whether there is something
non-canonical in your request. It wouldn't trip on "10%" or "%%" or
such things, it would catch valid encodings of characters that should
not be encoded in the first place. Because these are the dangerous
and suspicious ones. And this allows to also spot the "%u0061". It
could also check for "../". By doing so you would be able to write :

    http-request deny if { req.url,is_suspicious }
    http-request deny if { req.path /admin/ }

I tend to prefer this one because it doesn't modify the request and
will not break applications. And that's more or less the way I've
used to proceed in certain environments with regex matching stuff
like "%(2[D-F]|3[0-9]|[46][0-0A-F]|[57][0-9A])" which is already
very efficient at blocking most of these patterns.

A normalize action could however also correctly recode non-printable
characters that some browsers follow in links. But various options
are usable simultaneously, such as :

    http-request normalize-uri if { req.url,is_suspicious }

> If you have anything to add after reading this mail: Please do!

Sure, I always have something to add :-)

As I mentioned in our exchanges, due to all the variations above, URL-based

Some application servers might want to see /images/../admin as something
valid under a symlink while others will instead resolve it to /admin. Some
will also consider that /admin/login.php.html is the same as /admin/login.php,
and that this is also the same as /admin/.%5Clogin.php. You could also face
the case of internal rerouting of URLs like "/admin/debug?path=/login.php"
where it's the application server itself which routes its request inside
the application.

So even with normalization, you'd be left with a huge doubt and your
application would remain totally insecure. This is why some responders
said that such filtering ought to be made on the server itself. And
you rightfully pointed that .htaccess is placed in the proper directory
for a reason. It's also why we've implemented certificate passing in
headers. It might not be the easiest thing to deploy as you said, but
having the application protect itself is the only way to make it secure
by DESIGN and not by applying pads on top of identified wounds.

However, putting a protection layer in front of the application *IS* a
good way to protect it against zero-day attacks. Blocking "../", "%00",
double extensions, invalid arguments and so on will definitely save the
application from being exposed to a lot of dangerous dirt, starting with
vulnerability scanners which eat your CPU and fill your logs.

That's why I still have a preference for implementing a nice and
configurable converter to detect non-canonical forms and probably
another one which would provide a good level of security in a single


Reply via email to