On Tue, Apr 4, 2017, at 18:41, Christopher Wood wrote: > (Writing this out here for posterity and people seeing similar items.) > > A little while ago I erroneously thought that gnatsd might use openssl > and thus had gnatsd tagged to restart on openssl package update via > puppet. (Found https://golang.org/pkg/crypto/ssl/, untagged the gnatsd > service.) > > While gnatsd itself was fine after the restart, the server was not happy > with ~1.9k mcollectived reconnecting at once. > > Mar 21 12:24:48 mcomq2 kernel: possible SYN flooding on port 4242. > Sending cookies. > > The affected mcollectived were logging this and not retrying: > > W, [2017-03-21T10:55:43.213823 #9006] WARN -- : natswrapper.rb:117:in > `block (3 levels) in start' Disconnected from NATS: Client disconnected > from server on nats://mcomq2.me.com:4242 > > The solution was two-part: > > 1) Upgrade choria to be able to update from eventmachine+nats gems to > nats-pure 0.2.2. > > https://github.com/nats-io/pure-ruby-nats > https://github.com/choria-io/mcollective-choria > > 2) Add some sysctls on the mcomq host to accomodate the initial rush of > connections. > > sysctl { 'net.core.somaxconn': value => '4092' } > sysctl { 'net.ipv4.tcp_max_syn_backlog': value => '8192' } > > https://forge.puppet.com/thias/sysctl > > After that it has been back to smooth sailing. >
Nice!, I'll add a note to the Choria docs to this effect. I did also consider making the :reconnect_time_wait option be some random between 0 and 5 to spread the reconnects, right now its set to 1. Do you think that would that have been a good choice given your experience? -- --- You received this message because you are subscribed to the Google Groups "mcollective-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
