So we've backed out a bunch of network tuning parameters we had set (based on 
the GPFS wiki pages), they've been set a while but um ... maybe they are 
causing issues.

Secondly, we've noticed in dump tscomm that we see connection broken to a node, 
and then the node ID is usually the same node, which is a bit weird to me.

We've also just updated firmware on the Intel nics (the x722) which is part of 
the Skylake board. And specifically its the newer skylake kit we see this 
problem on. We've a number of issues with the x722 firmware (like it won't even 
bring a link up when plugged into some of our 10GbE switches, but that's 
another story).

We've also dropped the bonded links from these nodes, just in case its 
related...

Simon

________________________________
From: [email protected] 
[[email protected]] on behalf of [email protected] 
[[email protected]]
Sent: 17 January 2019 14:30
To: Tomer Perry; gpfsug main discussion list
Cc: Yong Ze Chen
Subject: Re: [gpfsug-discuss] Node expels

>They always appear to be to a specific type of hardware with the same Ethernet 
>controller,

That makes me think you might be seeing packet loss that could require ring 
buffer tuning (the defaults and limits will differ with different ethernet 
adapters).

The expel section in the slides on this page has been expanded to include a 
'debugging expels section' (slides 19-20, which also reference ring buffer 
tuning):
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/DEBUG%20Expels/comment/7e4f9433-7ca3-430f-b40b-94777c507381

Regards,
John Lewars
Spectrum Scale Performance, IBM Poughkeepsie




From:        Tomer Perry/Israel/IBM
To:        gpfsug main discussion list <[email protected]>
Cc:        John Lewars/Poughkeepsie/IBM@IBMUS, Yong Ze Chen/China/IBM@IBMCN
Date:        01/17/2019 08:28 AM
Subject:        Re: [gpfsug-discuss] Node expels
________________________________


Hi,

I was asked to elaborate a bit ( thus also adding John and Yong Ze Chen).

As written on the slide:
One of the best ways to determine if a network layer problem is root cause for 
an expel is to look at the low-level socket details dumped in the ‘extra’ log 
data (mmfs dump all) saved as part of automatic data collection on Linux GPFS 
nodes.

So, the idea is that in expel situation, we dump the socket state from the OS ( 
you can see the same using 'ss -i' for example).
In your example, it shows that the ca_state is 4, there are retransmits, high 
rto and all the point to a network problem.
You can find more details here: 
http://www.yonch.com/tech/linux-tcp-congestion-control-internals


Regards,

Tomer Perry
Scalable I/O Development (Spectrum Scale)
email: [email protected]
1 Azrieli Center, Tel Aviv 67021, Israel
Global Tel:    +1 720 3422758
Israel Tel:      +972 3 9188625
Mobile:         +972 52 2554625





From:        "Tomer Perry" <[email protected]>
To:        gpfsug main discussion list <[email protected]>
Date:        17/01/2019 13:46
Subject:        Re: [gpfsug-discuss] Node expels
Sent by:        [email protected]
________________________________



Simon,

Take a look at 
http://files.gpfsug.org/presentations/2018/USA/Scale_Network_Flow-0.8.pdfslide 
13.


Regards,

Tomer Perry
Scalable I/O Development (Spectrum Scale)
email: [email protected]
1 Azrieli Center, Tel Aviv 67021, Israel
Global Tel:    +1 720 3422758
Israel Tel:      +972 3 9188625
Mobile:         +972 52 2554625




From:        Simon Thompson <[email protected]>
To:        "[email protected]" <[email protected]>
Date:        17/01/2019 13:35
Subject:        [gpfsug-discuss] Node expels
Sent by:        [email protected]
________________________________



We’ve recently been seeing quite a few node expels with messages of the form:

2019-01-17_11:19:30.882+0000: [W] The TCP connection to IP address 10.20.0.58 
proto-pg-pf01.bear.cluster <c0n236> (socket 153) state is unexpected: state=1 
ca_state=4 snd_cwnd=1 snd_ssthresh=5 unacked=5 probes=0 backoff=7 retransmits=7 
rto=26496000 rcv_ssthresh=102828 rtt=6729 rttvar=12066 sacked=0 retrans=1 
reordering=3 lost=5
2019-01-17_11:19:30.882+0000: [I] tscCheckTcpConn: Sending debug data 
collection request to node 10.20.0.58 proto-pg-pf01.bear.cluster
2019-01-17_11:19:30.882+0000: Sending request to collect TCP debug data to 
proto-pg-pf01.bear.cluster localNode
2019-01-17_11:19:30.882+0000: [I] Calling user exit script 
gpfsSendRequestToNodes: event sendRequestToNodes, Async command 
/usr/lpp/mmfs/bin/mmcommon.
2019-01-17_11:24:52.611+0000: [E] Timed out in 300 seconds waiting for a 
commMsgCheckMessages reply from node 10.20.0.58 proto-pg-pf01.bear.cluster. 
Sending expel message.

On the client node, we see messages of the form:

2019-01-17_11:19:31.101+0000: [N] sdrServ: Received Tcp data collection request 
from 10.10.0.33
2019-01-17_11:19:31.102+0000: [N] GPFS will attempt to collect Tcp debug data 
on this node.
2019-01-17_11:24:52.838+0000: [N] sdrServ: Received expel data collection 
request from 10.10.0.33
2019-01-17_11:24:52.838+0000: [N] GPFS will attempt to collect debug data on 
this node.
2019-01-17_11:25:02.741+0000: [N] This node will be expelled from cluster 
rds.gpfs.servers due to expel msg from 10.10.12.41 (b
ber-les-nsd01-data.bb2.cluster in rds.gpfs.server
2019-01-17_11:25:03.160+0000: [N] sdrServ: Received expel data collection 
request from 10.20.0.56

They always appear to be to a specific type of hardware with the same Ethernet 
controller, though the nodes are split across three data centres and we aren’t 
seeing link congestion on the links between them.

On the node I listed above, it’s not actually doing anything either as the 
software on it is still being installed (i.e. it’s not doing GPFS or any other 
IO other than a couple of home directories).

Any suggestions on what “(socket 153) state is unexpected” means?

Thanks

Simon

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss




_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to