[gpfsug-discuss] e: GPFS Remote Cluster Co-existence with, CTDB/NFS Re-exporting

Chris hunter Thu, 10 Dec 2015 16:11:54 -0800

Hi Stewart,

Can't comment on NFS nor snapshot issues. However its common to changefilesystem parameters "maxMissedPingTimeout" and "minMissedPingTimeout"when adding remote clusters.

https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/General%20Parallel%20File%20System%20(GPFS)/page/Tuning%20Parameters


Below is an earlier gpfsug thread on about remote cluster expels:

Re: [gpfsug-discuss] data interface and management infercace.

*Bob Oesterlin*oester atgmail.com<mailto:gpfsug-discuss%40gpfsug.org?Subject=Re:%20Re%3A%20%5Bgpfsug-discuss%5D%20data%20interface%20and%20management%20infercace.&In-Reply-To=%3CCAMNdFvA8ZjY%3DM8LABsw93zXgE03jh-YzCXEYHS7rTDZue-OddA%40mail.gmail.com%3E>

/Mon Jul 13 18:42:47 BST 2015/
Some thoughts on node expels, based on the last 2-3 months of "expel hell"
here. We've spent a lot of time looking at this issue, across multiple
clusters. A big thanks to IBM for helping us center in on the right issues.
First, you need to understand if the expels are due to "expired lease"
message, or expels due to "communication issues". It sounds like you are
talking about the latter. In the case of nodes being expelled due to
communication issues, it's more likely the problem in related to network
congestion. This  can occur at many levels - the node, the network, or the
switch.

When it's a communication issue, changing prams like "missed ping timeout"
isn't going to help you. The problem for us ended up being that GPFS wasn't
getting a response to a periodic "keep alive" poll to the node, and after
300 seconds, it declared the node dead and expelled it. You can tell if
this is the issue by starting to look at the RPC waiters just before the
expel. If you see something like "Waiting for poll on sock" RPC, that the
node is waiting for that periodic poll to return, and it's not seeing it.
The response is either lost in the network, sitting on the network queue,
or the node is too busy to send it. You may also see RPC's like "waiting
for exclusive use of connection" RPC - this is another clear indication of
network congestion.

Look at the GPFSUG presentions (http://www.gpfsug.org/presentations/) for
one by Jason Hick (NERSC) - he also talks about these issues. You need to
take a look at net.ipv4.tcp_wmem and net.ipv4.tcp_rmem, especially if you
have client nodes that are on slower network interfaces.

In our case, it was a number of factors - adjusting these  settings,
looking at congestion at the switch level, and some physical hardware
issues.

Bob Oesterlin, Sr Storage Engineer, Nuance Communications

robert.oesterlin at nuance.com<http://gpfsug.org/mailman/listinfo/gpfsug-discuss>

chris hunter
[email protected]

-----Original Message-----
Sent: Friday, 11 December 2015 2:14 AM
To: gpfsug main discussion list<[email protected]>
Subject: Re: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS 
Re-exporting

Hi Again Everybody,

Ok, so we got resolution on this.  Recall that I had said we'd just added ~300 
remote cluster GPFS clients and started having problems with CTDB the very same 
day...

Among those clients, there were three that had misconfigured firewalls, such 
that they could reach our home cluster nodes on port 1191, but our home cluster 
nodes could*not*  reach them on 1191*or*  on any of the ephemeral ports.  This 
situation played absolute*havoc*  with the stability of the filesystem.  From 
what we could tell, it seemed that these three nodes would establish a 
harmless-looking connection and mount the filesystem.  However, as soon as one 
of them acquired a resource (lock token or similar?) that the home cluster 
needed back...watch out!

In the GPFS logs on our side, we would see messages asking for the expulsion of these 
nodes about 4 - 5 times per day and a ton of messages about timeouts when trying to 
contact them.  These nodes would then re-join the cluster, since they could contact us, 
and this would entail repeated "delay N seconds for recovery" events.

During these recovery periods, the filesystem would become unresponsive for up 
to 60 or more seconds at a time.  This seemed to cause various NFS processes to 
fall on their faces.  Sometimes, the victim would be nfsd itself;  other times, 
it would be rpc.mountd.  CTDB would then come check on NFS, find that it was 
floundering, and start a recovery run.  To make things worse, at those very 
times the CTDB shared accounting files would*also*  be unavailable since they 
reside on the same GPFS filesystem that they are serving (thanks to Doug for 
pointing out the flaw in this design and we're currently looking for an 
alternate home for these shared files).

This all added up to a*lot*  of flapping, in NFS as well as with CTDB itself.  
However, the problems with CTDB/NFS were a*symptom*  in this case, not a root 
cause.  The*cause*  was the imperfect connectivity of just three out of 300 new 
clients.  I think the moral of the story here is this:  if you're adding remote 
cluster clients, make*absolutely*  sure that all communications work going both 
ways between your home cluster and*every*  new client.  If there is 
asymmetrical connectivity such as we had last week, you are in for one wild 
ride.  I would also point out that the flapping did not stop until we resolved 
connectivity for*all*  of the clients, so remember that even having one single 
half-connected client is poisonous to your stability.

Thanks to everybody for all of your help!  Unless something changes, I'm 
declaring that our site is out of the woods on this one

Stewart

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

[gpfsug-discuss] e: GPFS Remote Cluster Co-existence with, CTDB/NFS Re-exporting

Reply via email to