>They always appear to be to a specific type of hardware with the same Ethernet controller,
That makes me think you might be seeing packet loss that could require ring buffer tuning (the defaults and limits will differ with different ethernet adapters). The expel section in the slides on this page has been expanded to include a 'debugging expels section' (slides 19-20, which also reference ring buffer tuning): https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/DEBUG%20Expels/comment/7e4f9433-7ca3-430f-b40b-94777c507381 Regards, John Lewars Spectrum Scale Performance, IBM Poughkeepsie From: Tomer Perry/Israel/IBM To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org> Cc: John Lewars/Poughkeepsie/IBM@IBMUS, Yong Ze Chen/China/IBM@IBMCN Date: 01/17/2019 08:28 AM Subject: Re: [gpfsug-discuss] Node expels Hi, I was asked to elaborate a bit ( thus also adding John and Yong Ze Chen). As written on the slide: One of the best ways to determine if a network layer problem is root cause for an expel is to look at the low-level socket details dumped in the ‘extra’ log data (mmfs dump all) saved as part of automatic data collection on Linux GPFS nodes. So, the idea is that in expel situation, we dump the socket state from the OS ( you can see the same using 'ss -i' for example). In your example, it shows that the ca_state is 4, there are retransmits, high rto and all the point to a network problem. You can find more details here: http://www.yonch.com/tech/linux-tcp-congestion-control-internals Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: t...@il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: "Tomer Perry" <t...@il.ibm.com> To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org> Date: 17/01/2019 13:46 Subject: Re: [gpfsug-discuss] Node expels Sent by: gpfsug-discuss-boun...@spectrumscale.org Simon, Take a look at http://files.gpfsug.org/presentations/2018/USA/Scale_Network_Flow-0.8.pdf slide 13. Regards, Tomer Perry Scalable I/O Development (Spectrum Scale) email: t...@il.ibm.com 1 Azrieli Center, Tel Aviv 67021, Israel Global Tel: +1 720 3422758 Israel Tel: +972 3 9188625 Mobile: +972 52 2554625 From: Simon Thompson <s.j.thomp...@bham.ac.uk> To: "gpfsug-discuss@spectrumscale.org" <gpfsug-discuss@spectrumscale.org> Date: 17/01/2019 13:35 Subject: [gpfsug-discuss] Node expels Sent by: gpfsug-discuss-boun...@spectrumscale.org We’ve recently been seeing quite a few node expels with messages of the form: 2019-01-17_11:19:30.882+0000: [W] The TCP connection to IP address 10.20.0.58 proto-pg-pf01.bear.cluster <c0n236> (socket 153) state is unexpected: state=1 ca_state=4 snd_cwnd=1 snd_ssthresh=5 unacked=5 probes=0 backoff=7 retransmits=7 rto=26496000 rcv_ssthresh=102828 rtt=6729 rttvar=12066 sacked=0 retrans=1 reordering=3 lost=5 2019-01-17_11:19:30.882+0000: [I] tscCheckTcpConn: Sending debug data collection request to node 10.20.0.58 proto-pg-pf01.bear.cluster 2019-01-17_11:19:30.882+0000: Sending request to collect TCP debug data to proto-pg-pf01.bear.cluster localNode 2019-01-17_11:19:30.882+0000: [I] Calling user exit script gpfsSendRequestToNodes: event sendRequestToNodes, Async command /usr/lpp/mmfs/bin/mmcommon. 2019-01-17_11:24:52.611+0000: [E] Timed out in 300 seconds waiting for a commMsgCheckMessages reply from node 10.20.0.58 proto-pg-pf01.bear.cluster. Sending expel message. On the client node, we see messages of the form: 2019-01-17_11:19:31.101+0000: [N] sdrServ: Received Tcp data collection request from 10.10.0.33 2019-01-17_11:19:31.102+0000: [N] GPFS will attempt to collect Tcp debug data on this node. 2019-01-17_11:24:52.838+0000: [N] sdrServ: Received expel data collection request from 10.10.0.33 2019-01-17_11:24:52.838+0000: [N] GPFS will attempt to collect debug data on this node. 2019-01-17_11:25:02.741+0000: [N] This node will be expelled from cluster rds.gpfs.servers due to expel msg from 10.10.12.41 (b ber-les-nsd01-data.bb2.cluster in rds.gpfs.server 2019-01-17_11:25:03.160+0000: [N] sdrServ: Received expel data collection request from 10.20.0.56 They always appear to be to a specific type of hardware with the same Ethernet controller, though the nodes are split across three data centres and we aren’t seeing link congestion on the links between them. On the node I listed above, it’s not actually doing anything either as the software on it is still being installed (i.e. it’s not doing GPFS or any other IO other than a couple of home directories). Any suggestions on what “(socket 153) state is unexpected” means? Thanks Simon _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss