Hi Willy.

On Sat, 26 Mar 2022 10:22:02 +0100
Willy Tarreau <w...@1wt.eu> wrote:

> Hi,
> 
> HAProxy 2.6-dev4 was released on 2022/03/26. It added 80 new commits
> after version 2.6-dev3.
> 
> The activity started to calm down a bit, which is good because we're
> roughly 2 months before the release and it will become important to avoid
> introducing last-minute regressions.
> 
> This version mostly integrates fixes for various bugs in various places
> like stream-interfaces, QUIC, the HTTP client or the trace subsystem. The
> remaining patches are mostly QUIC improvements and code cleanups. In
> addition the MQTT protocol parser was extended to also support MQTTv3.1.
> 
> A change discussed around previous announce was made in the H2 mux: the
> "timeout http-keep-alive" and "timeout http-request" are now respected
> and work as documented, so that it will finally be possible to force such
> connections to be closed when no request comes even if they're seeing
> control traffic such as PING frames. This can typically happen in some
> server-to-server communications whereby the client application makes use
> of PING frames to make sure the connection is still alive. I intend to
> backport this after some time, probably to 2.5 and later 2.4, as I've
> got reports about stable versions currently posing this problem.
> 
> I'm expecting to see another batch of stream-interface code refactoring
> that Christopher is still working on. This is a very boring and tedious
> task that should significantly lower the long-term maintenance effort,
> so I'm willing to wait a little bit for such changes to be ready. What
> this means for users is a reduction of the bugs we've seen over the last
> 2-3 years alternating between truncated responses and never-dying
> connections and that result from the difficulty to propagate certain
> events across multiple layers.
> 
> Also William still has some updates to finish on the HTTP client
> (connection retries, SSL cert verification and host name resolution
> mainly). On the paper, each of them is relatively easy, but practically,
> since the HTTP client is the first one of its category, each attempt to
> progress is stopped by the discovery of a shortcoming or bug that were
> not visible before. Thus the progress takes more time than desired but
> as a side effect, the core code gets much more reliable by getting rid
> of these old issues.
> 
> One front that made impressive progress over the last few months is QUIC.
> While a few months ago we were counting the number of red boxes on the
> interop tests at https://interop.seemann.io/ to figure what to work on as
> a top priority, now we're rather counting the number of tests that report
> a full-green state, and haproxy is now on par with other servers in these
> tests. Thus the idea emerged, in order to continue to make progress on
> this front, to start to deploy QUIC on haproxy.org so that interoperability
> issues with browsers and real-world traffic can be spotted. A few attempts
> were made and already revealed issues so for now it's disabled again. Be
> prepared to possibly observe a few occasional hiccups when visiting the
> site (and if so, please do complain to us). The range of possible issues
> would likely be frozen transfers and truncated responses, but these should
> not happen.
> 
> From a technical point, the way it's done is by having a separate haproxy
> process listening to QUIC on UDP port 1443, and forwarding HTTP requests
> to the existing process. The main process constantly checks the QUIC one,
> and when it's seen as operational, it appends an Alt-Svc header that
> indicates the client that an HTTP/3 implementation is available on port
> 1443, and that this announce is valid for a short time (we'll leave it to
> one minute only so that issues can resolve quickly, but for now it's only
> 10s so that quick tests cause no harm):
> 
>     http-response add-header alt-svc 'h3=":1443"; ma=60' if \
>        { var(txn.host) -m end haproxy.org } { nbsrv(quic) gt 0 }
> 
> As such, compatible browsers are free to try to connect there or not. Other
> tools (such as git clone) will not use it. For those impatient to test it,
> the QUIC process' status is reported at the bottom of the stats page here:
> http://stats.haproxy.org/. The "quic" socket in the frontend at the top
> reports the total traffic received from the QUIC process, so if you're
> seeing it increase while you reload the page it's likely that you're using
> QUIC to read it. In Firefox I'm having this little plugin loaded:
> 
>   https://addons.mozilla.org/en-US/firefox/addon/http2-indicator/
> 
> It displays a small flash on the URL bar with different colors depending
> on the protocol used to load the page (H1/SPDY/H2/H3). When that works it's
> green (H3), otherwise it's blue (H2).
> 
> At this point I'd still say "do not reproduce these experiments at home".
> Amaury and Fred are still watching the process' traces very closely to
> spot bugs and stop it as soon as a problem is detected. But it's still
> too early for being operated by non-developers. The hope is that by 2.6
> we'll reach the point where enthousiasts can deploy a few instances on
> not-too-sensitive sites with sufficient confidence and a little dose of
> monitoring.
> 
> Finally, there's another topic I'd like to bring on the table now about
> stuff for post-2.6. Several of us had discussions recently around service
> discovery in general, and the conclusions that will probably not surprise
> many people are that:
> 
>   - using DNS for service discovery is a disaster. The protocol was
>     never designed for this and presents many shortcomings, starting
>     with partial responses that are the cause of flapping servers that
>     users only work around by artificially extending the hold parameter;
>     several users have experienced process stuck in O(n^5) resolution
>     loops that were fortunately interrupted by the watchdog(!). Even if
>     the complexity was since reduced to something like O(n^3), it's still
>     not something I would recommend to anyone because you start small at
>     4 servers and one day you figure that your business has grown to a
>     few tens to hundreds of servers and you don't have the time to make
>     the switch to a different solution anymore.
> 
>   - alternatives to DNS for service discovery are mature (various HTTP-
>     based APIs) but the need to update the config file and/or to
>     occasionally reload haproxy when needed makes that incompatible with
>     a location inside the haproxy process itself; in addition, such new
>     mechanisms tend to come with ready-to-use libs for high-level languages
>     that would require a lot of time to reimplement inside haproxy anyway.
> 
>   - that would typically be a perfect job for the dataplane-API but it
>     can currently only use the CLI to communicate with haproxy, and
>     this CLI was designed for humans, thus adopting every new command
>     (such as server addition/removal) still takes quite some time,
>     because specific work has to be done for each and every single new
>     command.
>
> As such I find it important that for the long term we'd focus on:
>   - improving the communication between the dataplane-API and haproxy;
>     we've had discussions with the dataplane-API team and figured some
>     points that would make their life much better (ability to dump the
>     list of supported keywords, supporting REST/JSON etc). This also
>     means that we need to be even more careful when extending existing
>     directives with new keywords, to use the existing keyword registration
>     subsystems as much as possible and rely less and less on strcmp() and
>     other parser tricks.
> 
>   - getting rid of the unreliable DNS-based discovery once it can be
>     done differently (i.e. no more SRV records nor spraying of random
>     IPs to IP-less servers). The issues in this area managed to keep 3
>     people busy for 2 months during 2.5 development and it will never
>     fully satisfy users because the concept is fundamentally flawed.
> 
> To be honest, I have no idea how long all the stuff above could take,
> especially if we want to design it correctly and not reproduce mistakes
> from the past. I would have liked to be able to say "no more DNS-based
> discovery after 2.6" so that we would start to warn users about possible
> deprecation, and it would probably be reasonable to think that it would
> be the last LTS version with this. I'm interested in opinions and feedback
> about this. And the next question will obviously be "how could we detect
> such users and warn them?". Using DNS to resolve server names to IPs is
> fine (it was initially done for use within AWS). I was thinking about a
> few possible approaches like detecting the combined use of server-templates
> and nameservers, or maybe just asking that an acknowledgement keyword is
> added in nameserver sections referenced by multiple servers to confirm
> that the warning was read, or detect the service name syntax using the
> underscore (not sure it's sufficient). Opinions and ideas are welcome
> here.

I fully agree with "using DNS for service discovery is a disaster." and the DNS
was the easiest way in the past for service discovery.

A possible solution could be that there is a registration API in HAProxy which
uses the dynamic server feature to add them self to a HAProxy backend/listener.

There should be a shared secret to protect the HAProxy API against attacks and
can only be used via TLS.

Something like this.

```
{
  "shared_secret": "<some_text>",
  "instance_name": "<some_text>",
  "name_or_ip": "<Name_or_IP>",
  more fields,
}
```

I would suggest JSON as it is the more or less a standard for API interaction.
The benefit of JSON is also that's not another cli is necessary to maintain
just for that feature.

I'm not sure if it's know that envoy uses the "xDS REST and gRPC protocol" for
there endpoint config. I'm not sure if "xDS REST" brings any benefit in
HAProxy but maybe we can get some Ideas how the problem is solved there.

https://www.envoyproxy.io/docs/envoy/latest/api-docs/xds_protocol

The terminology helps to understand some of the xds parts.
https://www.envoyproxy.io/docs/envoy/latest/intro/life_of_a_request#terminology

If we scale the view a little bit out of "add backend servers to HAProxy" and
thinking like "Add some bunch of backends to a haproxy clusters" we can think
about to use something like the raft protocol ( https://raft.github.io/ ).

Because most companies out there have not only one HAProxy instance there are
running at least two instances and therefore is required to have a solution
which could work with more the one instance o HAProxies.

To add the raft protocol would of course increase the complexity of HAProxy but
offers the handling of join/remove of backends and in HAProxy  can then the
dynamic server feature be used to add the new backend to the backend section.

The benefit from my point of view is to have a underlying algorithm which offers
a consensual handling of join/remove of Servers/Endpoints.

Maybe the peers protocol could also be used for that part as it is already part
of HAProxy.

That's my 2c, I'm just brainstorming for some options which comes to my mind.
:-)

Regards
Alex

Reply via email to