Hi Lucas, On Fri, Dec 29, 2017 at 06:06:49AM +0000, Lucas Rolff wrote: > As much as I agree about that specs should be followed, I realized that even > if there's people that want to follow the spec 100%, there will always be > implementations used in large scale that won't be following the spec 100% -
Absolutely, and the general principle always applies : "be strict in what you send, be liberal in what you accept". > the reasoning behind this can be multiple - one I could imagine is the fact > when browsers or servers start implementing a new protocol (h2 is a good > example) before the spec is actually 100% finalized - when it's then > finalized, the vendor might end up with a implementation that is slightly > violating the actual spec, but however either won't fix it because the > violations are minor or because the violations are not by any way "breaking" > when you also compare it to the other implementations done. Oh I know this pretty well, and you forget one of the most important aspects which is that browsers first implemented SPDY and then modified it to create H2. So some of the H2 rules were not closely followed initially because they were not possible to implement with the initial SPDY code. Also this early implementation before the protocol is 100% finalized is responsible for some of the dirty things that were promised to be fixed before the RFC but were rejected by some implementers who had already deployed (hpack header ordering, hpack encoding). > In this case if I understand you correctly, the errors are related to the > fact that certain clients didn't implement the spec correctly in first place. Yes but here we were speaking about a very recent client, which doesn't make much sense, especially with such an important header. > I was very curious why e.g. the Connection header (even if it isn't sent by > Firefox or Safari/Webkit even though their webdev tools say it is), would > work in nginx, and Apache for that matter, so I asked on their mailing list > why they were violating the spec. > > Valentin gave a rather interesting answer why they in their software actually > decided to sometimes violate specific parts, it all boiled down to client > support, because they also realized the fact that many browsers (that might > be EOL and never get updated), might have implementations that would not work > with http2 in that case. > > http://mailman.nginx.org/pipermail/nginx/2017-December/055356.html Oh I totally agree with all the points Valentin made there. We've all been forced to swallow a lot of ugly stuff to be compatible with existing deployments. Look at "option accept-invalid-http-requests/responses" to get an idea. Such ugly things even ended up as relaxed rules in the updated HTTP spec (723x), like support for multiple content-length headers that some clients or servers send and that we check for correctness. > I know that it's different software, and that how others decide to design > their software is completely up to them. It's important to keep in mind that haproxy's H2 support came very late. I participated a lot to the spec, even authored a proposal 6 years ago. But despite this I didn't have enough time to start implementing in haproxy by then. Most implementers already had a working SPDY implementation to start from, allowing them to progress much faster. Nginx was one of them, and as such they had to face all the early implementations and their issues. It's totally normal that they had to proceed like this. > Violating specs on purpose is generally bad, no doubt about that - but if > it's a requirement to be able to get good coverage in regards to clients > (both new and old browsers that are actually in use), then I understand why > one would go to such lengths as having to "hack" a bit to make sure generally > used browsers can use the protocol. We're always doing the same. Violating the specs on stuff you send is a big interoperability problem because you force all other ones to in turn violate the spec to support you. But violating the specs on stuff you accept is less big of a deal as long as it doesn't cause vulnerabilities. Silently dropping the connection-specific headers is no big deal, and clearly something we may have to do in the future if some trouble with existing clients are reported. Similarly when you run h2spec you'll see that sometimes we prefer to close using RST_STREAM than GOAWAY because the latter would require us to keep the stream "long enough" to remember about it, and we're not going to keep millions of streams in memory just for the sake of not breaking a connection once in a while. > > So at least my analysis for now is that for a reason still to be > > determined, this version of firefox didn't correctly interoperate with > > haproxy in a given environment > > Downgrading Firefox to earlier versions (such as 55, which is "pre"-quantum) > reveals the same issue with bad requests. Then I don't understand as it seems that different people are seeing different things. And even for me, 45.7, 52.0.2 and 57alpha never caused this to happen. I'm wondering if any browser plugin could be responsible for this header being emitted. This should clearly be asked on some firefox support list to confirm when this happens and why. > Hopefully you'll not have to violate the http2 spec in any way - but I do see > a valid point explained by Valentin - the fact that you cannot guarantee all > clients to be 100% compliant by the spec, and there might be a bunch of > (used) EOL devices around. I have no problem with this, and I agree with him about the fact that most of the rules result from overengineering, and from a wish to let the protocol fail very quickly on bad implementations to help all of them converge quickly. It's just that I want to have a *clear* and *reliable* report of the issue before starting to open holes in the spec and deal with complaints from people reporting that we're not compliant. And till now we don't know why this version of FF sometimes works and sometimes fails. > I used to work at a place where haproxy were used extensively, so seeing > http2 support getting better and better is a really awesome thing, because it > would actually mean that http2 could be implemented in that specific > environment - I do hope in a few releases that http2 in haproxy gets to a > point where we could rate it as "production ready", with no real visible bugs > from a customer perspective, at that point I think it would be good to > implement in a large scale environment (for a percentage of the requests) to > see how much traffic might actually get dropped in case the spec is followed > - to see from some real world workload how many clients actually violate the > spec. Yep. For what it's worth, it's been enabled for about one month on haproxy.org and till now we didn't get any bad report, which is pretty encouraging. > For now, I'll personally leave http2 support disabled - since it's breaking > my applications for a big percentage of my users, and I'll have to find an > intermediate solution until at least the bug in regards to Firefox losing > connections (this thing): > > Dec 28 21:22:35 localhost haproxy[1534]: 80.61.160.xxx:64921 > [28/Dec/2017:21:22:12.309] https_frontend~ https_frontend/<NOSRV> > -1/-1/-1/-1/22978 400 0 - - CR-- 1/1/0/0/0 0/0 "<BADREQ>" > Dec 28 21:22:40 localhost haproxy[1534]: 80.61.160.xxx:64972 > [28/Dec/2017:21:22:35.329] https_frontend~ cdn-backend/mycdn 0/0/1/0/5001 200 > 995 - - ---- 1/1/0/1/0 0/0 "GET /js/app.js?v=1 HTTP/1.1" If this is met in production it definitely is a problem that we have to address. Could you please try the attached patch to see if it fixes the issue for you ? If so, I would at least like that we can keep some statistics on it, and maybe even condition it. > I never expect software to be bug free - but at this given point, this > specific issue that happens causes too much visible "trouble" for end-users > for me to be able to keep it enabled I'm all for working around visible issues. We're among the rare software on which the finger is always pointed because it's inserted in the middle of a chain and when something changes it's haproxy's fault (while sometimes it only reveals a long problem), forcing us to adapt. So if the attached patch fixes the problem for you we'll have to adapt. I still would like to figure when the browsers send this and why because I would hate it that the patch is later reverted because someone tries version 60 saying "look the header is not sent anymore" while it's not in the same condition. > I'll figure out if I can replicate the same issue in more browsers (without > connection: keep-alive header), maybe that would give us more insight. That could indeed help. Cheers, Willy
diff --git a/src/h2.c b/src/h2.c index 43ed7f3..e3ab14c 100644 --- a/src/h2.c +++ b/src/h2.c @@ -191,7 +191,7 @@ int h2_make_h1_request(struct http_hdr *list, char *out, int osize) isteq(list[idx].n, ist("keep-alive")) || isteq(list[idx].n, ist("upgrade")) || isteq(list[idx].n, ist("transfer-encoding"))) - goto fail; + continue; if (isteq(list[idx].n, ist("te")) && !isteq(list[idx].v, ist("trailers"))) goto fail;