Hi guys,

On Mon, Dec 16, 2019 at 11:01:19PM +0100, Cyril Bonté wrote:
> Hi Willy,
> 
> Le 16/12/2019 à 22:06, Artur a écrit :
> > > > [...]
> > > > URLs like https://q.d/PPDSlide/testfile are correctly rewritten to
> > > > https://q.d/p3/PPDSlide/testfile and forwarded to the backend.
> > > > 
> > > > Once I switched to 2.1.1, haproxy no longer rewrites the URI and the
> > > > URIs remains unchanged while forwarded to the backend. I had to
> > > > downgrade to have the usual behaviour.
> > > > 
> > > > Is it a bug or something changed in normal haproxy behaviour with 2.1
> > > > release ?
> > > 
> > > I can confirm the issue.
> > > 
> > > It seems to happen with h2 requests only, since commit #30ee1efe67.
> > > haproxy normalizes the URI but replace-uri doesn't take into account
> > > this information. The fix should be easy for replace-uri (If someone
> > > wants to work on it, I won't have time this week).
> > > 
> > > http://git.haproxy.org/?p=haproxy-2.1.git;a=commit;h=30ee1efe67
> 
> I'm not 100% sure it's the right approach to fix the issue. Can you check if
> this patch may fix the issue in all conditions ?
> 
> From what I've observed, it seems we can rely on the HTX_SL_F_NORMALIZED_URI
> flag to detect if the URI was normalized by haproxy and in that case, we
> should start at the path instead of the URI.

In fact it's not a bug and is the intended behavior. The rule in Artur's
config only matches origin URIs, not absolute ones, which most H2 clients
use.

In HTTP/1, there are two ways to send a request URI:
  origin form:    GET /path/to/resource HTTP/1.1
  absolute form:  GET http://server.domain.tld/path/to/resource HTTP/1.1

Clients mostly use the origin form with servers, and mostly use the
absolute form with proxies, and even then, the mapping is 99.99% and
not 100% in either case. It's worth noting that Artur's rule already
does not work with the absolute form, it only handles the origin form.

In H2 you have the same distinction, except that clients were initially
encouraged to use the absolute form only. The first non-HTX implementation
used to implement only a subset of H2 and to map H2 to H1 using the Host
header field, transforming incoming absolute URIs to origin URIs. This
broke gRPC and end-to-end H2 because the scheme was lost. This also
prevented such requests from being passed to proxies. So starting with
2.1 we now respect the format used by the client.

It turns out that there seems to exist a number of configs in field which
are already broken with H1 but which work by pure luck as they only handle
the origin form. Arguably, many of these have evolved over the ages because
initially they were designed to use the old reqrep matching which was very
painful to deal with, and writing extra config just to handle an almost
always absent scheme was not worth it. Nowadays we have multiple actions
available depending on what we want to handle:

  - set-path to set only the path component
  - set-uri to set the whole URI
  - set-query to set the query string only
  - replace-uri to replace a part of the URI using parts from it

In theory in order to perform some path rewrites, one would only need
set-path. In Artur's case, it should be as simple as :

    http-request set-path /p3%[path]

But now I'm starting to suspect that most of the problem comes from the
fact that people who used to rely on regex in the past will not as easily
perform their rewrites using set-path as they would using a replace rule
which is very similar to the old set. So probably we'd need to introduce
a "replace-path" action and suggest it in the warning emitted for reqrep.

I think it is important that we properly address such needs and am
willing to backport anything like this to 2.1 to ease the transition if
that's the best solution.

Please advise,
Willy

Reply via email to