Hi Dave Thanks for the info, So interestingly we had the crash at exactly the same time, so we are 3 for 3 on that
The setups sounds very similar, but given we all saw issue at the same time, it really points to something more global. We are using NTP from our firewalls, which in turn get it from our ISP, so i doubt that is the cause, so it could be external port scanning which is the cause as you suggest. or maybe a leap second of some sort? Willy any thoughts on the time co-incidence? Thanks Dave On 3 April 2017 at 17:45, Dave Cottlehuber <[email protected]> wrote: > On Mon, 13 Mar 2017, at 13:31, David King wrote: > > Hi All > > > > Apologies for the delay in response, i've been out of the country for the > > last week > > > > Mark, my gut feeling is that is network related in someway, so thought we > > could compare the networking setup of our systems > > > > You mentioned you see the hang across geo locations, so i assume there > > isn't layer 2 connectivity between all of the hosts? is there any back > > end > > connectivity between the haproxy hosts? > > Following up on this, some interesting points but nothing useful. > > - Mark & I see the hang at almost exactly the same time on the same day: > 2017-02-27T14:36Z give or take a minute either way > > - I see the hang in 3 different regions using 2 different hosting > providers on both clustered and non-clustered services, but all on > FreeBSD 11.0R amd64. There is some dependency between these systems but > nothing unusual (logging backends, reverse proxied services etc). > > - our servers don't have a specific workload that would allow them all > to run out of some internal resource at the same time, as their reboot > and patch cycles are reasonably different - typically a few days elapse > between first patches and last reboots unless its deemed high risk > > - our networking setup is not complex but typical FreeBSD: > - LACP bonded Gbit igb(4) NICs > - CARP failover for both ipv4 & ipv6 addresses > - either direct to haproxy for http & TLS traffic, or via spiped to > decrypt intra-server traffic > - haproxy directs traffic into jailed services > - our overall load and throughput is low but consistent > - pf firewall > - rsyslog for logging, along with riemann and graphite for metrics > - all our db traffic (couchdb, kyoto tycoon) and rabbitmq go via haproxy > - haproxy 1.6.10 + libressl at the time > > As I'm not one for conspiracy theories or weird coincidences, somebody > port scanning the internet with an Unexpectedly Evil Packet Combo seems > the most plausible explanation. I cannot find an alternative that would > fit the scenario of 3 different organisations with geographically > distributed equipment and unconnected services reporting an unusual > interruption on the same day and almost the same time. > > Since then, I've moved to FreeBSD 11.0p8, haproxy 1.7.3 and latest > libressl and seen no recurrence, just like the last 8+ months or so > since first deploying haproxy on FreeBSD instead of debian & nginx. > > If the issue recurs I plan to run a small cyclic traffic capture with > tcpdump and wait for a re-repeat, see > https://superuser.com/questions/286062/practical-tcpdump-examples > > Let me know if I can help or clarify further. > > A+ > Dave >

