Hi Matthias,

On Tue, Mar 07, 2017 at 01:21:39PM +0100, Matthias Fechner wrote:
> could you please check my first post, there is the complete haproxy config
> included,

Oh I'm sorry, I simply cannot catch up with so many e-mails, at ~1500/day
that's about 10k/wk and I can't follow any single thread anymore :-(

> here a short summary:
> sslh is listening on 192.168.0.251:443 and 192.168.1.2:443 and
> 192.168.200.6:443 is acting as transparent proxy forwarding all requests to
> 192.168.0.251:8443 or 192.168.200.6:8443
> External connections reaching the server on
> 192.168.0.251/192.168.1.2/192.168.200.6 depending on which network interface
> they came in.
> haproxy (frontend) is listening on localhost:443 and 192.168.0.251:8443,
> 192.168.200.6:8443 (in tcp mode, as I use http2 to connect to backend
> nginx).
>
> So in this case I would say that haproxy is hanging.

Yes and then it's irrelevant to the connect() thing that was recently changed,
we have to restart the thinking from scratch.

> > Yes, please test with "nokqueue" in the global section, or start haproxy
> > with
> > "-dk". It will switch to poll() and will tell us if there's a bug in the
> > kqueue
> > poller. Please be aware that your CPU usage will increase a bit.
> 
> I tested it again.
> haproxy 1.7.3 with patch reverted -> timeout
> haproxy 1.7.3 with patch reverted and kqueue disabled -> timeout
> ps shows: /usr/local/sbin/haproxy -dk -q -f /usr/local/etc/haproxy.conf -p
> /var/run/haproxy.pid
> haproxy 1.7.2 -> no timeouts
> 
> I cannot says that in sslh is maybe a bug that is now triggered by a change
> inside haproxy.
> The homepage of sslh is here: http://www.rutschle.net/tech/sslh.shtml
> 
> I use sslh to get ssh/https/openvpn to listen the same port (443). sslh just
> looks at the first bytes of the connection and acts as a transparent proxy
> to forward the connection to haproxy (https), ssh or openvpn.

OK, I didn't initially understand there was this in front, and was not even
aware of this component since most of us do that directly from within haproxy
(and have the proxy protocol added for free to forwarded connections).
There's very little chance it has an effect causing haproxy to enter a
CLOSE_WAIT, but we never know obviously.

> Version 1.7.2 shows:
> HA-Proxy version 1.7.2 2017/01/13
(...)

So they're pretty much identical except the version. Are you interested in
trying to do a bisection between 1.7.2 and 1.7.3 to find the culprit commit ?
There are only 20 patches so it should take about 5 attempts so depending
on the time it takes for the problem to appear it may be faster than
speculating on each individual patch. If you're interested, the procedure
is the following :

  - from a git tree containing haroxy 1.7, you start bisection between 1.7.2
    and 1.7.3 this way :

        $ git bisect start v1.7.3 v1.7.2

  - it will cut the history in the middle and will checkout this state.

  - then you build and run the resulting executable

  - if it fails, you type :

        $ git bisect bad

    and if it works, you type :

        $ git bisect good

  - this will result in git only focusing on half of the remaining history
    either before or after the test point and to checkout the next commit so
    that you can build and test again. If you end up on a patch that
    definitely cannot be the culprit (eg: doc) you can even skip it :

        $ git bisect skip

  - once you're done or if you failed, simply type "git bisect reset".

  - it's often nice to note the last known good and last known bad commits
    especially if it takes a bit of time because if you happen to do
    something else with your sources, you don't want to screw up your
    bisection point.

At the end git will tell you "this is the first bad commit". This
way we'll be sure what to look at. There are a few candidates :

  - disable of close on redirects by default (though you're in pure TCP
    most of the time)
  - failure to properly handle polling on connect()
  - side effect of the fix on the analysers (possibly uncovering another
    bug that used to stay hidden)
  - SSL build fixes.

I'm still wondering why you're the only one facing this for now and I
suspect it's unrelated to the fact that you're on FreeBSD 11.

Thanks,
Willy

Reply via email to