Hi Lucas,

On Fri, Dec 29, 2017 at 06:06:49AM +0000, Lucas Rolff wrote:
> As much as I agree about that specs should be followed, I realized that even
> if there's people that want to follow the spec 100%, there will always be
> implementations used in large scale that won't be following the spec 100% -

Absolutely, and the general principle always applies : "be strict in what you
send, be liberal in what you accept".

> the reasoning behind this can be multiple - one I could imagine is the fact
> when browsers or servers start implementing a new protocol (h2 is a good
> example) before the spec is actually 100% finalized - when it's then
> finalized, the vendor might end up with a implementation that is slightly
> violating the actual spec, but however either won't fix it because the
> violations are minor or because the violations are not by any way "breaking"
> when you also compare it to the other implementations done.

Oh I know this pretty well, and you forget one of the most important aspects
which is that browsers first implemented SPDY and then modified it to create
H2. So some of the H2 rules were not closely followed initially because they
were not possible to implement with the initial SPDY code. Also this early
implementation before the protocol is 100% finalized is responsible for some
of the dirty things that were promised to be fixed before the RFC but were
rejected by some implementers who had already deployed (hpack header ordering,
hpack encoding).

> In this case if I understand you correctly, the errors are related to the
> fact that certain clients didn't implement the spec correctly in first place.

Yes but here we were speaking about a very recent client, which doesn't make
much sense, especially with such an important header.

> I was very curious why e.g. the Connection header (even if it isn't sent by
> Firefox or Safari/Webkit even though their webdev tools say it is), would
> work in nginx, and Apache for that matter, so I asked on their mailing list
> why they were violating the spec.
> 
> Valentin gave a rather interesting answer why they in their software actually
> decided to sometimes violate specific parts, it all boiled down to client
> support, because they also realized the fact that many browsers (that might
> be EOL and never get updated), might have implementations that would not work
> with http2 in that case.
> 
> http://mailman.nginx.org/pipermail/nginx/2017-December/055356.html

Oh I totally agree with all the points Valentin made there. We've all been
forced to swallow a lot of ugly stuff to be compatible with existing
deployments. Look at "option accept-invalid-http-requests/responses"
to get an idea. Such ugly things even ended up as relaxed rules in the
updated HTTP spec (723x), like support for multiple content-length headers
that some clients or servers send and that we check for correctness.

> I know that it's different software, and that how others decide to design
> their software is completely up to them.

It's important to keep in mind that haproxy's H2 support came very late. I
participated a lot to the spec, even authored a proposal 6 years ago. But
despite this I didn't have enough time to start implementing in haproxy by
then. Most implementers already had a working SPDY implementation to start
from, allowing them to progress much faster. Nginx was one of them, and as
such they had to face all the early implementations and their issues. It's
totally normal that they had to proceed like this.

> Violating specs on purpose is generally bad, no doubt about that - but if
> it's a requirement to be able to get good coverage in regards to clients
> (both new and old browsers that are actually in use), then I understand why
> one would go to such lengths as having to "hack" a bit to make sure generally
> used browsers can use the protocol.

We're always doing the same. Violating the specs on stuff you send is a big
interoperability problem because you force all other ones to in turn violate
the spec to support you. But violating the specs on stuff you accept is less
big of a deal as long as it doesn't cause vulnerabilities. Silently dropping
the connection-specific headers is no big deal, and clearly something we may
have to do in the future if some trouble with existing clients are reported.
Similarly when you run h2spec you'll see that sometimes we prefer to close
using RST_STREAM than GOAWAY because the latter would require us to keep the
stream "long enough" to remember about it, and we're not going to keep
millions of streams in memory just for the sake of not breaking a connection
once in a while.

> > So at least my analysis for now is that for a reason still to be
> > determined, this version of firefox didn't correctly interoperate with
> > haproxy in a given environment
> 
> Downgrading Firefox to earlier versions (such as 55, which is "pre"-quantum)
> reveals the same issue with bad requests.

Then I don't understand as it seems that different people are seeing different
things. And even for me, 45.7, 52.0.2 and 57alpha never caused this to happen.
I'm wondering if any browser plugin could be responsible for this header being
emitted. This should clearly be asked on some firefox support list to confirm
when this happens and why.

> Hopefully you'll not have to violate the http2 spec in any way - but I do see
> a valid point explained by Valentin - the fact that you cannot guarantee all
> clients to be 100% compliant by the spec, and there might be a bunch of
> (used) EOL devices around.

I have no problem with this, and I agree with him about the fact that most
of the rules result from overengineering, and from a wish to let the protocol
fail very quickly on bad implementations to help all of them converge quickly.
It's just that I want to have a *clear* and *reliable* report of the issue
before starting to open holes in the spec and deal with complaints from
people reporting that we're not compliant. And till now we don't know why
this version of FF sometimes works and sometimes fails.

> I used to work at a place where haproxy were used extensively, so seeing
> http2 support getting better and better is a really awesome thing, because it
> would actually mean that http2 could be implemented in that specific
> environment - I do hope in a few releases that http2 in haproxy gets to a
> point where we could rate it as "production ready", with no real visible bugs
> from a customer perspective, at that point I think it would be good to
> implement in a large scale environment (for a percentage of the requests) to
> see how much traffic might actually get dropped in case the spec is followed
> - to see from some real world workload how many clients actually violate the
> spec.

Yep. For what it's worth, it's been enabled for about one month on haproxy.org
and till now we didn't get any bad report, which is pretty encouraging.

> For now, I'll personally leave http2 support disabled - since it's breaking
> my applications for a big percentage of my users, and I'll have to find an
> intermediate solution until at least the bug in regards to Firefox losing
> connections (this thing):
> 
> Dec 28 21:22:35 localhost haproxy[1534]: 80.61.160.xxx:64921 
> [28/Dec/2017:21:22:12.309] https_frontend~ https_frontend/<NOSRV> 
> -1/-1/-1/-1/22978 400 0 - - CR-- 1/1/0/0/0 0/0 "<BADREQ>"
> Dec 28 21:22:40 localhost haproxy[1534]: 80.61.160.xxx:64972 
> [28/Dec/2017:21:22:35.329] https_frontend~ cdn-backend/mycdn 0/0/1/0/5001 200 
> 995 - - ---- 1/1/0/1/0 0/0 "GET /js/app.js?v=1 HTTP/1.1"

If this is met in production it definitely is a problem that we have to
address. Could you please try the attached patch to see if it fixes the
issue for you ? If so, I would at least like that we can keep some
statistics on it, and maybe even condition it.

> I never expect software to be bug free - but at this given point, this
> specific issue that happens causes too much visible "trouble" for end-users
> for me to be able to keep it enabled

I'm all for working around visible issues. We're among the rare software on
which the finger is always pointed because it's inserted in the middle of a
chain and when something changes it's haproxy's fault (while sometimes it
only reveals a long problem), forcing us to adapt. So if the attached patch
fixes the problem for you we'll have to adapt. I still would like to figure
when the browsers send this and why because I would hate it that the patch
is later reverted because someone tries version 60 saying "look the header
is not sent anymore" while it's not in the same condition.

> I'll figure out if I can replicate the same issue in more browsers (without
> connection: keep-alive header), maybe that would give us more insight.

That could indeed help.

Cheers,
Willy
diff --git a/src/h2.c b/src/h2.c
index 43ed7f3..e3ab14c 100644
--- a/src/h2.c
+++ b/src/h2.c
@@ -191,7 +191,7 @@ int h2_make_h1_request(struct http_hdr *list, char *out, 
int osize)
                    isteq(list[idx].n, ist("keep-alive")) ||
                    isteq(list[idx].n, ist("upgrade")) ||
                    isteq(list[idx].n, ist("transfer-encoding")))
-                       goto fail;
+                       continue;
 
                if (isteq(list[idx].n, ist("te")) && !isteq(list[idx].v, 
ist("trailers")))
                        goto fail;

Reply via email to