Hi Simon,
We've had to disable the offload's for Intel cards in many situations
with the i40e drivers - Redhat have an article about it:
https://access.redhat.com/solutions/3662011
-------
Orlando
On 17/01/2019 19:02, Simon Thompson wrote:
So we've backed out a bunch of network tuning parameters we had set
(based on the GPFS wiki pages), they've been set a while but um ...
maybe they are causing issues.
Secondly, we've noticed in dump tscomm that we see connection broken
to a node, and then the node ID is usually the same node, which is a
bit weird to me.
We've also just updated firmware on the Intel nics (the x722) which is
part of the Skylake board. And specifically its the newer skylake kit
we see this problem on. We've a number of issues with the x722
firmware (like it won't even bring a link up when plugged into some of
our 10GbE switches, but that's another story).
We've also dropped the bonded links from these nodes, just in case its
related...
Simon
------------------------------------------------------------------------
*From:* [email protected]
[[email protected]] on behalf of
[email protected] [[email protected]]
*Sent:* 17 January 2019 14:30
*To:* Tomer Perry; gpfsug main discussion list
*Cc:* Yong Ze Chen
*Subject:* Re: [gpfsug-discuss] Node expels
>They always appear to be to a specific type of hardware with the same
Ethernet controller,
That makes me think you might be seeing packet loss that could require
ring buffer tuning (the defaults and limits will differ with different
ethernet adapters).
The expel section in the slides on this page has been expanded to
include a 'debugging expels section' (slides 19-20, which also
reference ring buffer tuning):
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/DEBUG%20Expels/comment/7e4f9433-7ca3-430f-b40b-94777c507381
Regards,
John Lewars
Spectrum Scale Performance, IBM Poughkeepsie
From: Tomer Perry/Israel/IBM
To: gpfsug main discussion list <[email protected]>
Cc: John Lewars/Poughkeepsie/IBM@IBMUS, Yong Ze Chen/China/IBM@IBMCN
Date: 01/17/2019 08:28 AM
Subject: Re: [gpfsug-discuss] Node expels
------------------------------------------------------------------------
Hi,
I was asked to elaborate a bit ( thus also adding John and Yong Ze Chen).
As written on the slide:
One of the best ways to determine if a network layer problem is root
cause for an expel is to look at the low-level socket details dumped
in the ‘extra’ log data (mmfs dump all) saved as part of automatic
data collection on Linux GPFS nodes.
So, the idea is that in expel situation, we dump the socket state from
the OS ( you can see the same using 'ss -i' for example).
In your example, it shows that the ca_state is 4, there are
retransmits, high rto and all the point to a network problem.
You can find more details here:
http://www.yonch.com/tech/linux-tcp-congestion-control-internals
Regards,
Tomer Perry
Scalable I/O Development (Spectrum Scale)
email: [email protected]
1 Azrieli Center, Tel Aviv 67021, Israel
Global Tel: +1 720 3422758
Israel Tel: +972 3 9188625
Mobile: +972 52 2554625
From: "Tomer Perry" <[email protected]>
To: gpfsug main discussion list <[email protected]>
Date: 17/01/2019 13:46
Subject: Re: [gpfsug-discuss] Node expels
Sent by: [email protected]
------------------------------------------------------------------------
Simon,
Take a look at
_http://files.gpfsug.org/presentations/2018/USA/Scale_Network_Flow-0.8.pdf_slide
13.
Regards,
Tomer Perry
Scalable I/O Development (Spectrum Scale)
email: [email protected]
1 Azrieli Center, Tel Aviv 67021, Israel
Global Tel: +1 720 3422758
Israel Tel: +972 3 9188625
Mobile: +972 52 2554625
From: Simon Thompson <[email protected]>
To: "[email protected]" <[email protected]>
Date: 17/01/2019 13:35
Subject: [gpfsug-discuss] Node expels
Sent by: [email protected]
------------------------------------------------------------------------
We’ve recently been seeing quite a few node expels with messages of
the form:
2019-01-17_11:19:30.882+0000: [W] The TCP connection to IP address
10.20.0.58 proto-pg-pf01.bear.cluster <c0n236> (socket 153) state is
unexpected: state=1 ca_state=4 snd_cwnd=1 snd_ssthresh=5 unacked=5
probes=0 backoff=7 retransmits=7 rto=26496000 rcv_ssthresh=102828
rtt=6729 rttvar=12066 sacked=0 retrans=1 reordering=3 lost=5
2019-01-17_11:19:30.882+0000: [I] tscCheckTcpConn: Sending debug data
collection request to node 10.20.0.58 proto-pg-pf01.bear.cluster
2019-01-17_11:19:30.882+0000: Sending request to collect TCP debug
data to proto-pg-pf01.bear.cluster localNode
2019-01-17_11:19:30.882+0000: [I] Calling user exit script
gpfsSendRequestToNodes: event sendRequestToNodes, Async command
/usr/lpp/mmfs/bin/mmcommon.
2019-01-17_11:24:52.611+0000: [E] Timed out in 300 seconds waiting for
a commMsgCheckMessages reply from node 10.20.0.58
proto-pg-pf01.bear.cluster. Sending expel message.
On the client node, we see messages of the form:
2019-01-17_11:19:31.101+0000: [N] sdrServ: Received Tcp data
collection request from 10.10.0.33
2019-01-17_11:19:31.102+0000: [N] GPFS will attempt to collect Tcp
debug data on this node.
2019-01-17_11:24:52.838+0000: [N] sdrServ: Received expel data
collection request from 10.10.0.33
2019-01-17_11:24:52.838+0000: [N] GPFS will attempt to collect debug
data on this node.
2019-01-17_11:25:02.741+0000: [N] This node will be expelled from
cluster rds.gpfs.servers due to expel msg from 10.10.12.41 (b
ber-les-nsd01-data.bb2.cluster in rds.gpfs.server
2019-01-17_11:25:03.160+0000: [N] sdrServ: Received expel data
collection request from 10.20.0.56
They always appear to be to a specific type of hardware with the same
Ethernet controller, though the nodes are split across three data
centres and we aren’t seeing link congestion on the links between them.
On the node I listed above, it’s not actually doing anything either as
the software on it is still being installed (i.e. it’s not doing GPFS
or any other IO other than a couple of home directories).
Any suggestions on what “(socket 153) state is unexpected” means?
Thanks
Simon
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org_
__http://gpfsug.org/mailman/listinfo/gpfsug-discuss_
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
--
<http://pixitmedia.com>
This email is confidential in that it is intended
for the exclusive attention of the addressee(s) indicated. If you are not
the intended recipient, this email should not be read or disclosed to any
other person. Please notify the sender immediately and delete this email
from your computer system. Any opinions expressed are not necessarily those
of the company from which this email was sent and, whilst to the best of
our knowledge no viruses or defects exist, no responsibility can be
accepted for any loss or damage arising from its receipt or subsequent use
of this email.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss