Hello Romain,
On Tue, Oct 01, 2019 at 12:08:03PM +0000, Morotti, Romain D wrote:
> What is the status on this?
Sorry, but it took some time to work on other priorities, and to be
honest, the subject looked scary enough to deserve enough time to
study it. Sadly, if I can say, the subject was pretty descriptive of
what it does, and this is fundamentally wrong.
So just to summarize for those who haven't had a look at the patch,
what this patch does is to replace the host header in requests sent to
a server with one specified for this server, resulting in each server
within the same farm to run on a different vhost. This goes back to
the errors that plagued many hosting infrastructures in the late 90s
and early 2000s where redirects, hosts in pages etc were wrong because
the application was called with an internal name instead of the correct
one. Moreover this used to prevent servers from being shared between
multiple hosts since the host header was containing rubish. For
reference, Apache merged the ProxyPreserveHost in 2.0.31 in 2002 to put
an end to that madness. But here you're escalating this to a new level
and involving it in load balancing, 15 years after everyone managed to
fix that early mistake and bury it deeply forever. So in short for a
request sent to foo.bar.com, you'll randomly forward it as being for
foo1.bar.com, bar.foo2.com, www.example.org etc... all in the same farm!
It's as if when launching my browser on Amazon I was presented eBay,
Amazon and Aliexpress in a round robin fashion. I'm exagerating but it
really is the same principle.
It is critically important that Host headers are not tampered with
anymore, especially in a load balanced setup where all nodes must
absolutely receive the exact same requests. Nowadays they're used
way more than in the early 2000, you have them in Set-Cookie,
Access-Control-Allow-Origin header with CORS, Location for redirects
(obviously), logs, statistics. It makes absolutely no sense to deploy
multiple servers in a same farm on different host names for the same
service. And no, migrating from an old Apache 1.3 config relying on
ProxyPass is not a valid excuse in 2019 for reintroducing such
architectural flaws!
Such mistakes are still occasionally encountered in field, but extremely
rarely. Usually it's after their author, and single defender, quits the
company and their substitute discovers the mess and says "Houston, we
have a problem". But fortunately while I've met it very few times with
reverse proxies over the last decade, I never met it at all within a
load balancing farm in 18 years!
I'm seeing this as the outcome of someone having shoot himself in the
foot, then aiming at the knee in order not to feel the foot's pain
anymore. But here I don't want to engage every users' knee in front
of this gun.
For the part concerning the health checks however your point is totally
valid and I absolutely hate the "\r\nHost:foo" hack that we're relying on
from time to time. I'd like to see checks reworked so that we can pass
various headers etc. I'm not fond of having just a vhost directive to be
used by checks because 1) it's confusing and 2) we're just hiding the dust
under the carpet. We do need really improved checks and dealing with Host
only is not sufficient. Some users need to pass cookies, auth headers,
user-agent, or x-forwarded-for headers for example, and such configs are
really really dirty. Usually they switch to tcp-checks but then they cannot
use fancy features like disable-on-404.
> The lack of this functionality is a blocker to use HAProxy with kubernetes.
I'd say that many existing haproxy users in kubernetes would probably
disagree with this bold claim. Saying that you're facing difficulties
and are seeking for configuration help would possibly be more accurate.
In my opinion at this point it's preferable that you clearly describe
the problem you're facing so that it can be split in 3 parts :
- architecture issues
- configuration issues
- haproxy limitations
I really suspect that you're facing a bit of each, and that for example
you're having an issue with your checks being unusable or unreliable on
the *real* application, forcing you to check a dummy one running on the
server itself, which does expect to be called with its own name and that
since haproxy uses the same check requests for all hosts you're facing
difficulties running the dummy checks. But maybe fixing the application
so that it can really be checked would be the best solution, maybe
configuring the dummy check application with a proper host would be the
second best solution, maybe using HTTP/1.0 checks to get rid of Host
(as many people do notably on static servers to end up on the default
vhost) would work fine, or maybe we need to perform minimal changes to
the checks to deal with a specific case.
Among the things I've been thinking about in the past regarding short
term improvements to the checks that could remain compatible with a
whole rewrite, I'd enumerate :
- a new directive "http-check send-cookie" to bring the ability to
automatically send the cookie header with the server's value when
configured ; some would probably suggest that it should always be
set and does not even require an option maybe.
- the ability to define a general purpose argument on a server line
("check-arg") that would be sent in any header after the checks,
and as a side effect could be used to send a Host header.
For example we could have this :
backend foo
# send "X-Farm-Name: foo" in all checks
server foo1 1.1.1.1:8000 check-arg foo
server foo2 1.1.1.2:8000 check-arg foo
http-check send-arg X-Farm-Name
As well as this :
backend foo
# send "X-Server-Name: foo1/2" in all checks
server foo1 1.1.1.1:8000 check-arg foo1
server foo2 1.1.1.2:8000 check-arg foo2
http-check send-arg X-Server-Name
Or this :
backend foo
# send "Host: foo1/2.example.internal" in all checks
server foo1 1.1.1.1:8000 check-arg foo1.example.internal
server foo2 1.1.1.2:8000 check-arg foo2.example.internal
http-check send-arg Host
We could even be more forward-thinking and allow to define and reference
multiple such arguments :
backend foo
# send "Host: foo1/2.example.internal" in all checks
server foo1 1.1.1.1:8000 check-arg(foo,foo1.example.internal)
server foo2 1.1.1.2:8000 check-arg(foo,foo2.example.internal)
http-check send-arg(1) X-Farm-Name
http-check send-arg(2) Host
These are just ideas of course and code speaks better.
If you think that your issue really is only related to health checks
(which would then be quite a relief) and that something like above would
suit your needs, I'm willing to help you get this merged before the code
freeze. I just don't want to spend extra energy for nothing :-/
Ideas and suggestions are welcome.
Thanks,
Willy