Hello,

The use case is to load balance applications in multiple datacenters or regions.
The common pattern today to cover multiple locations is to deploy services in 
each location separately and independently.

This happens with kubernetes for example, where a cluster is typically limited 
to a datacenter. Covering multiple locations is done by having independent 
clusters and provisioning the application to each of them.
e.g. myapp.kube-naest1.example.com + myapp.kube-euwest.example.com.

As a developer, I want to present the application on a single consistent URL 
with failover, e.g. myapp.example.com.
Without getting into too much details, this is done with a layer of load 
balancing and this requires careful consideration around DNS, Host header, TLS 
SNI, certificates and healthcheck to work end-to-end.

Let's take the two most basic use cases.

# multi clusters use case
backend foo
    mode http
    option httpchk GET /healthcheck
    http-send-name-header Host
    server myapp.kube-naest1.example.com myapp.kube-naest1.example.com:80
    server myapp.kube-naest2.example.com myapp.kube-naest2.example.com:80

# multi regions use case
backend foo
    mode http
    option httpchk GET /healthcheck
    http-send-name-header Host
    server myapp.kube-naest1.example.com myapp.kube-naest1.example.com:80
    server myapp.kube-euwest.example.com myapp.kube-euwest.example.com:80

Each backend expects its own Host header, otherwise kubernetes cannot route and 
serve the request.
In haproxy, this can be made to work using the "http-send-name-header Host" 
directive, that overrides the Host header per backend.

This setup fails in practice because of the healthcheck failing. The 
healthcheck request sent by haproxy doesn't have a host header. It ignores the 
"http-send-name-header" directive and there is no option to set the healthcheck 
host per backend.
(Similar challenge with TLS SNI, that is easily worked around by disabling TLS 
checks. There are settings "sni req.hdr(host)" and "check-sni" to adjust, with 
little information on all these settings combine).

The proposed patch intended to add one setting per backend to manage the host 
header end-to-end consistently and reliably: adjusting healthchecks, forwarding 
requests and TLS domain.

If you prefer not to do this, I can think of a less intrusive patch to 
configure the healthcheck.
Since there is already the "http-send-name-header Host" directive, whose main 
use case is to adjust the Host header per backend.
I could patch the healthcheck code to follow that directive when running https 
healthchecks, if that's okay with you.

Thinking of it, it could be considered a bug that the healthcheck doesn't do 
that already. If I configure all my servers to get a header, it's certainly 
important and I expect the healthcheck to get it too.

Related questions and issues:
https://serverfault.com/questions/876871/configure-haproxy-to-include-host-headers-for-different-backends
https://serverfault.com/questions/770737/making-haproxy-pass-a-host-name-in-httpcheck
https://serverfault.com/questions/594669/haproxy-health-checking-multiple-servers-with-different-host-names


Regards.


From: Willy Tarreau [mailto:w...@1wt.eu]
Sent: 03 October 2019 05:51
To: Morotti, Romain D (CIB Tech, GBR) <romain.d.moro...@jpmorgan.com>
Cc: haproxy@formilux.org; Sayar, Guy H (CIB Tech, GBR) 
<guy.h.sa...@jpmorgan.com>
Subject: Re: [PR/FEATURE] support for virtual hosts / Host header per server

Hello Romain,

On Tue, Oct 01, 2019 at 12:08:03PM +0000, Morotti, Romain D wrote:
> What is the status on this?

Sorry, but it took some time to work on other priorities, and to be
honest, the subject looked scary enough to deserve enough time to
study it. Sadly, if I can say, the subject was pretty descriptive of
what it does, and this is fundamentally wrong.

So just to summarize for those who haven't had a look at the patch,
what this patch does is to replace the host header in requests sent to
a server with one specified for this server, resulting in each server
within the same farm to run on a different vhost. This goes back to
the errors that plagued many hosting infrastructures in the late 90s
and early 2000s where redirects, hosts in pages etc were wrong because
the application was called with an internal name instead of the correct
one. Moreover this used to prevent servers from being shared between
multiple hosts since the host header was containing rubish. For
reference, Apache merged the ProxyPreserveHost in 2.0.31 in 2002 to put
an end to that madness. But here you're escalating this to a new level
and involving it in load balancing, 15 years after everyone managed to
fix that early mistake and bury it deeply forever. So in short for a
request sent to 
foo.bar.com<https://secureweb.jpmchase.net/readonly/http:/foo.bar.com>, you'll 
randomly forward it as being for
foo1.bar.com<https://secureweb.jpmchase.net/readonly/http:/foo1.bar.com>, 
bar.foo2.com<https://secureweb.jpmchase.net/readonly/http:/bar.foo2.com>, 
www.example.org<https://secureweb.jpmchase.net/readonly/http:/www.example.org> 
etc... all in the same farm!
It's as if when launching my browser on Amazon I was presented eBay,
Amazon and Aliexpress in a round robin fashion. I'm exagerating but it
really is the same principle.

It is critically important that Host headers are not tampered with
anymore, especially in a load balanced setup where all nodes must
absolutely receive the exact same requests. Nowadays they're used
way more than in the early 2000, you have them in Set-Cookie,
Access-Control-Allow-Origin header with CORS, Location for redirects
(obviously), logs, statistics. It makes absolutely no sense to deploy
multiple servers in a same farm on different host names for the same
service. And no, migrating from an old Apache 1.3 config relying on
ProxyPass is not a valid excuse in 2019 for reintroducing such
architectural flaws!

Such mistakes are still occasionally encountered in field, but extremely
rarely. Usually it's after their author, and single defender, quits the
company and their substitute discovers the mess and says "Houston, we
have a problem". But fortunately while I've met it very few times with
reverse proxies over the last decade, I never met it at all within a
load balancing farm in 18 years!

I'm seeing this as the outcome of someone having shoot himself in the
foot, then aiming at the knee in order not to feel the foot's pain
anymore. But here I don't want to engage every users' knee in front
of this gun.

For the part concerning the health checks however your point is totally
valid and I absolutely hate the "\r\nHost:foo" hack that we're relying on
from time to time. I'd like to see checks reworked so that we can pass
various headers etc. I'm not fond of having just a vhost directive to be
used by checks because 1) it's confusing and 2) we're just hiding the dust
under the carpet. We do need really improved checks and dealing with Host
only is not sufficient. Some users need to pass cookies, auth headers,
user-agent, or x-forwarded-for headers for example, and such configs are
really really dirty. Usually they switch to tcp-checks but then they cannot
use fancy features like disable-on-404.

> The lack of this functionality is a blocker to use HAProxy with kubernetes.

I'd say that many existing haproxy users in kubernetes would probably
disagree with this bold claim. Saying that you're facing difficulties
and are seeking for configuration help would possibly be more accurate.

In my opinion at this point it's preferable that you clearly describe
the problem you're facing so that it can be split in 3 parts :

  - architecture issues
  - configuration issues
  - haproxy limitations

I really suspect that you're facing a bit of each, and that for example
you're having an issue with your checks being unusable or unreliable on
the *real* application, forcing you to check a dummy one running on the
server itself, which does expect to be called with its own name and that
since haproxy uses the same check requests for all hosts you're facing
difficulties running the dummy checks. But maybe fixing the application
so that it can really be checked would be the best solution, maybe
configuring the dummy check application with a proper host would be the
second best solution, maybe using HTTP/1.0 checks to get rid of Host
(as many people do notably on static servers to end up on the default
vhost) would work fine, or maybe we need to perform minimal changes to
the checks to deal with a specific case.

Among the things I've been thinking about in the past regarding short
term improvements to the checks that could remain compatible with a
whole rewrite, I'd enumerate :
  - a new directive "http-check send-cookie" to bring the ability to
    automatically send the cookie header with the server's value when
    configured ; some would probably suggest that it should always be
    set and does not even require an option maybe.

  - the ability to define a general purpose argument on a server line
    ("check-arg") that would be sent in any header after the checks,
    and as a side effect could be used to send a Host header.

For example we could have this :

  backend foo
      # send "X-Farm-Name: foo" in all checks
      server foo1 1.1.1.1:8000 check-arg foo
      server foo2 1.1.1.2:8000 check-arg foo
      http-check send-arg X-Farm-Name

As well as this :

  backend foo
      # send "X-Server-Name: foo1/2" in all checks
      server foo1 1.1.1.1:8000 check-arg foo1
      server foo2 1.1.1.2:8000 check-arg foo2
      http-check send-arg X-Server-Name

Or this :

  backend foo
      # send "Host: foo1/2.example.internal" in all checks
      server foo1 1.1.1.1:8000 check-arg foo1.example.internal
      server foo2 1.1.1.2:8000 check-arg foo2.example.internal
      http-check send-arg Host

We could even be more forward-thinking and allow to define and reference
multiple such arguments :

  backend foo
      # send "Host: foo1/2.example.internal" in all checks
      server foo1 1.1.1.1:8000 check-arg(foo,foo1.example.internal)
      server foo2 1.1.1.2:8000 check-arg(foo,foo2.example.internal)
      http-check send-arg(1) X-Farm-Name
      http-check send-arg(2) Host

These are just ideas of course and code speaks better.

If you think that your issue really is only related to health checks
(which would then be quite a relief) and that something like above would
suit your needs, I'm willing to help you get this merged before the code
freeze. I just don't want to spend extra energy for nothing :-/

Ideas and suggestions are welcome.

Thanks,
Willy

This message is confidential and subject to terms at: 
https://www.jpmorgan.com/emaildisclaimer including on confidential, privileged 
or legal entity information, viruses and monitoring of electronic messages. If 
you are not the intended recipient, please delete this message and notify the 
sender immediately. Any unauthorized use is strictly prohibited.

Reply via email to