On Thu, Jan 13, 2022 at 3:22 AM Christopher Faulet <cfau...@haproxy.com>
wrote:

> However, during H1 parsing, the authority found in the URI is validated
> against
> the Host header. At this stage, both must be identical. Otherwise an error
> is
> reported. "accept-invalid-http-request" option is a valid workaround in
> this case.
>
> So now the question is to know if a pre-normalization must be performed
> during
> H1 parsing or not (in addition to the one performed during the request
> analysis). And if it should be extended to CONNECT requests. In practice,
> it is
> only an issue for CONNECT requests because the absolue-form is not the
> common
> form for URIs in H1.
>

The problem that I see with attempting to normalize CONNECT requests is
that since RFC7230#5.3.3 defines the authority-form, but does not mention
handling of the host header, and RFC7231#4.3.6 is explicit that the
authority-form be used in CONNECT but is also silent on the handling of the
host header, ambiguity arises because while RFC7230#5.4 requires a host
header for HTTP/1.1 requests, and states that 'A "host" without any
trailing port information implies the default port for the service
requested (e.g., "80" for an HTTP URL).' and RFC7230#2.7.1 states 'The
origin server for an "http" URI is identified by the authority component,
which includes a host identifier and optional TCP port'.

Nowhere yet have I come across an RFC that makes an exception for this
default optional port behavior in the host header for CONNECT methods,
and explicitly require (MUST/SHALL) that the host and authority match in
the specification, even though it is strongly implied in the examples
given.  Further, there is no provision in the RFCs for communicating the
"service requested" via CONNECT for default port evaluation to occur on the
Host header, and absent information about the URI scheme, there is no way
for haproxy to properly evaluate the equality of the authority vs host
header that does not contain a port without assumptions, since
RFC3986#6.2.3 requires a URI scheme to determine equivalence when using a
default port.

So, given the above, I do not currently see how CONNECT requests could be
successfully normalized to address this via normal parsing.

I can see where the haproxy logic is coming from that absent a scheme, the
authority and host must match for a request to be valid, but I can also see
that since the specifications are silent about requiring the inclusion of
port data in the host header for CONNECT requests in the RFCs, it is not
technically required to include the port, even if it would be best practice
to avoid the ambiguity outlined above.  Until haproxy 2.3/2.4 implemented
this check, I don't think anyone had looked at CONNECT semantics quite this
closely to find this gap in the standards.  It certainly seems that many
HTTP proxies ignore the port on Host if they even check it at all vs the
authority value.

I see a couple of possibilities to address this:

1) Sites that need this behavior to work rely on
"accept-invalid-http-request" and accept that http request validation will
not be done.  This is where we are today, but my preference would be to
have a slightly less blunt instrument available.

2) Add an option targeted at loosening only the strict authority vs host
checks, and as long as the host portion of the authority and the host
header matched, the request would not be rejected as invalid.  This would
seem to balance the needs of the most common case where CONNECT is not used
by keeping the current behavior by default and the less common case where
CONNECT is used with Java clients so that the looser interpretation could
only be enabled when needed.

3) A variation on #2, but also be able to configure ports that do not
require strict matching.  This is only marginally better than #2, but it
gives admins control over what ports they will consider "equivalent" for
the checks, so adding 80 and 443 to this port list would mean that "
example.com:443", "example.com:80" and "example.com" would pass the
authority vs host check, but "example.com:21" would not.  This is about the
cleanest way I can think of to deal with not having sufficient data to deal
with the concept of a default port value and still maintain some of the
validity checking.

Of course, my first preference is for Java to Just Do the Right Thing(tm),
but there is no telling when or if that may happen.

Andrew

Reply via email to