Re: [gpfsug-discuss] gpfs client expels

Vic Cornell Thu, 21 Aug 2014 06:22:50 -0700

For my system I always use a dedicated admin network - as described in the gpfs 
manuals - for a gpfs cluster on 10/40GbE where the system will be heavily 
loaded.


The difference in the stability of the system is very noticeable.

Not sure how/if this would work on GSS - IBM ought to know :-)

Vic


On 21 Aug 2014, at 14:18, Salvatore Di Nardo <[email protected]> wrote:

> This is an interesting point!
> 
> We use ethernet ( 10g links on the clients) but we dont have a separate 
> network for the admin network. 
> 
> Could you explain this a bit further, because the clients and the servers we 
> have are on different subnet so the packet are routed.. I don't see a 
> practical way to separate them. The clients are blades in a chassis so even 
> if i create 2 interfaces, they will physically use the came "cable" to go to 
> the first switch. even the clients ( 600 clients) have different subsets.
> 
> I will forward this consideration to our network admin , so see if we can 
> work on a dedicated network.
> 
> thanks for your tip.
> 
> Regards,
> Salvatore
> 
> 
> 
> 
> On 21/08/14 14:03, Vic Cornell wrote:
>> Hi Salvatore,
>> 
>> Are you using ethernet or infiniband as the GPFS interconnect to your 
>> clients?
>> 
>> If 10/40GbE - do you have a separate admin network?
>> 
>> I have seen behaviour similar to this where the storage traffic causes 
>> congestion and the "admin" traffic gets lost or delayed causing expels.
>> 
>> Vic
>> 
>> 
>> 
>> On 21 Aug 2014, at 10:04, Salvatore Di Nardo <[email protected]> wrote:
>> 
>>> Thanks for the feedback, but we managed to find a scenario that excludes 
>>> network problems.
>>> 
>>> we have a file called input_file of nearly 100GB:
>>> 
>>> if from client A we do:
>>> 
>>> cat input_file >> output_file
>>> 
>>> it start copying.. and we see waiter goeg a bit up,secs but then they 
>>> flushes back to 0, so we xcan say that the copy proceed well...
>>> 
>>> 
>>> if now we do the same from another client ( or just another shell on the 
>>> same client) client B :
>>> 
>>> cat input_file >> output_file
>>> 
>>> 
>>>  ( in other words we are trying to write to the same destination) all the 
>>> waiters gets up until one node get expelled.
>>> 
>>> 
>>> Now, while its understandable that the destination file is locked for one 
>>> of the "cat", so have to wait ( and since the file is BIG , have to wait 
>>> for a while), its not understandable why it stop the renewal lease. 
>>> Why its doen't return just a timeout error on the copy  instead to expel 
>>> the node? We can reproduce this every time, and since our users to 
>>> operations like this on files over 100GB each you can imagine the result.
>>> 
>>> 
>>> 
>>> As you can imagine even if its a bit silly to write at the same time to the 
>>> same destination, its also quite common if we want to dump to a log file 
>>> logs and for some reason one of the writers, write for a lot of time 
>>> keeping the file locked.
>>> Our expels are not due to network congestion, but because a write attempts 
>>> have to wait another one. What i really dont understand is why to take a so 
>>> expreme mesure to expell jest because a process is waiteing "to too much 
>>> time".
>>> 
>>> 
>>> I have ticket opened to IBM for this and the issue is under investigation, 
>>> but no luck so far..
>>> 
>>> Regards,
>>> Salvatore
>>> 
>>> 
>>> 
>>> On 21/08/14 09:20, Jez Tucker (Chair) wrote:
>>>> Hi there,
>>>> 
>>>>   I've seen the on several 'stock'?  'core'? GPFS system (we need a better 
>>>> term now GSS is out) and seen ping 'working', but alongside ejections from 
>>>> the cluster.
>>>> The GPFS internode 'ping' is somewhat more circumspect than unix ping - 
>>>> and rightly so.
>>>> 
>>>> In my experience this has _always_ been a network issue of one sort of 
>>>> another.  If the network is experiencing issues, nodes will be ejected.
>>>> Of course it could be unresponsive mmfsd or high loadavg, but I've seen 
>>>> that only twice in 10 years over many versions of GPFS.
>>>> 
>>>> You need to follow the logs through from each machine in time order to 
>>>> determine who could not see who and in what order.
>>>> Your best way forward is to log a SEV2 case with IBM support, directly or 
>>>> via your OEM and collect and supply a snap and traces as required by 
>>>> support.
>>>> 
>>>> Without knowing your full setup, it's hard to help further.
>>>> 
>>>> Jez
>>>> 
>>>> On 20/08/14 08:57, Salvatore Di Nardo wrote:
>>>>> Still problems. Here some more detailed examples:
>>>>> 
>>>>> EXAMPLE 1:
>>>>> EBI5-220 ( CLIENT)
>>>>> Tue Aug 19 11:03:04.980 2014: Timed out waiting for a reply from node 
>>>>> <GSS02B IP> gss02b
>>>>> Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A IP> (gss02a in 
>>>>> GSS.ebi.ac.uk) to expel <GSS02B IP> (gss02b in GSS.ebi.ac.uk) from 
>>>>> cluster GSS.ebi.ac.uk
>>>>> Tue Aug 19 11:03:04.982 2014: This node will be expelled from cluster 
>>>>> GSS.ebi.ac.uk due to expel msg from <EBI5-220 IP> (ebi5-220)
>>>>> Tue Aug 19 11:03:09.319 2014: Cluster Manager connection broke. Probing 
>>>>> cluster GSS.ebi.ac.uk
>>>>> Tue Aug 19 11:03:10.321 2014: Unable to contact any quorum nodes during 
>>>>> cluster probe.
>>>>> Tue Aug 19 11:03:10.322 2014: Lost membership in cluster GSS.ebi.ac.uk. 
>>>>> Unmounting file systems.
>>>>> Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount invoked.  File system: 
>>>>> gpfs1  Reason: SGPanic
>>>>> Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP> gss02a <c1p687>
>>>>> Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP> gss02a <c1p687>
>>>>> Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP> gss02b <c1p686>
>>>>> Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP> gss03b <c1p685>
>>>>> Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP> gss03a <c1p684>
>>>>> Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP> gss01b <c1p683>
>>>>> Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP> gss01a <c1p1>
>>>>> Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP> gss02b <c1p686>
>>>>> Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP> gss03b <c1p685>
>>>>> Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP> gss03a <c1p684>
>>>>> Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP> gss01b <c1p683>
>>>>> Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP> gss01a <c1p1>
>>>>> Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a in GSS.ebi.ac.uk) 
>>>>> is now the Group Leader.
>>>>> 
>>>>> GSS02B ( NSD SERVER)
>>>>> ...
>>>>> Tue Aug 19 11:03:17.070 2014: Killing connection from <EBI5-220 IP> 
>>>>> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:25.016 2014: Killing connection from <EBI5-102 IP> 
>>>>> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:28.080 2014: Killing connection from <EBI5-220 IP> 
>>>>> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:36.019 2014: Killing connection from <EBI5-102 IP> 
>>>>> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:39.083 2014: Killing connection from <EBI5-220 IP> 
>>>>> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:47.023 2014: Killing connection from <EBI5-102 IP> 
>>>>> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:50.088 2014: Killing connection from <EBI5-220 IP> 
>>>>> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:52.218 2014: Killing connection from <EBI5-043 IP> 
>>>>> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:03:58.030 2014: Killing connection from <EBI5-102 IP> 
>>>>> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:04:01.092 2014: Killing connection from <EBI5-220 IP> 
>>>>> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:04:03.220 2014: Killing connection from <EBI5-043 IP> 
>>>>> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:04:09.034 2014: Killing connection from <EBI5-102 IP> 
>>>>> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:04:12.096 2014: Killing connection from <EBI5-220 IP> 
>>>>> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:04:14.224 2014: Killing connection from <EBI5-043 IP> 
>>>>> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:04:20.037 2014: Killing connection from <EBI5-102 IP> 
>>>>> because the group is not ready for it to rejoin, err 46
>>>>> Tue Aug 19 11:04:23.103 2014: Accepted and connected to <EBI5-220 IP> 
>>>>> ebi5-220 <c0n618>
>>>>> ...
>>>>> 
>>>>> GSS02a ( NSD SERVER)
>>>>> Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b) request from 
>>>>> <EBI5-220 IP> (ebi5-220 in ebi-cluster.ebi.ac.uk). Expelling: <EBI5-220 
>>>>> IP> (ebi5-220 in ebi-cluster.ebi.ac.uk)
>>>>> Tue Aug 19 11:03:12.069 2014: Accepted and connected to <EBI5-220 IP> 
>>>>> ebi5-220 <c0n618>
>>>>> 
>>>>> 
>>>>> ===============================================
>>>>> EXAMPLE 2:
>>>>> 
>>>>> EBI5-038
>>>>> Tue Aug 19 11:32:34.227 2014: Disk lease period expired in cluster 
>>>>> GSS.ebi.ac.uk. Attempting to reacquire lease.
>>>>> Tue Aug 19 11:33:34.258 2014: Lease is overdue. Probing cluster 
>>>>> GSS.ebi.ac.uk
>>>>> Tue Aug 19 11:35:24.265 2014: Close connection to <GSS02A IP> gss02a 
>>>>> <c1n2> (Connection reset by peer). Attempting reconnect.
>>>>> Tue Aug 19 11:35:24.865 2014: Close connection to <EBI5-014 IP> ebi5-014 
>>>>> <c1n457> (Connection reset by peer). Attempting reconnect.
>>>>> ...
>>>>> LOT MORE RESETS BY PEER
>>>>> ...
>>>>> Tue Aug 19 11:35:25.096 2014: Close connection to <EBI5-167 IP> ebi5-167 
>>>>> <c1n155> (Connection reset by peer). Attempting reconnect.
>>>>> Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP> gss02a <c1n2>
>>>>> Tue Aug 19 11:35:25.268 2014: Close connection to <GSS02A IP> gss02a 
>>>>> <c1n2> (Connection failed because destination is still processing 
>>>>> previous node failure)
>>>>> Tue Aug 19 11:35:26.267 2014: Retry connection to <GSS02A IP> gss02a 
>>>>> <c1n2>
>>>>> Tue Aug 19 11:35:26.268 2014: Close connection to <GSS02A IP> gss02a 
>>>>> <c1n2> (Connection failed because destination is still processing 
>>>>> previous node failure)
>>>>> Tue Aug 19 11:36:24.276 2014: Unable to contact any quorum nodes during 
>>>>> cluster probe.
>>>>> Tue Aug 19 11:36:24.277 2014: Lost membership in cluster GSS.ebi.ac.uk. 
>>>>> Unmounting file systems.
>>>>> 
>>>>> GSS02a
>>>>> Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP> (ebi5-038 in 
>>>>> ebi-cluster.ebi.ac.uk) is being expelled because of an expired lease. 
>>>>> Pings sent: 60. Replies received: 60.
>>>>> 
>>>>> 
>>>>> 
>>>>> In example 1 seems that an NSD was not repliyng to the client, but the 
>>>>> servers seems working fine.. how can i trace better ( to solve) the 
>>>>> problem? 
>>>>> 
>>>>> In example 2 it seems to me that for some reason the manager are not 
>>>>> renewing the lease in time. when this happens , its not a single client. 
>>>>> Loads of them fail to get the lease renewed. Why this is happening? how 
>>>>> can i trace to the source of the problem?
>>>>> 
>>>>> 
>>>>> 
>>>>> Thanks in advance for any tips.
>>>>> 
>>>>> Regards,
>>>>> Salvatore
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> gpfsug-discuss mailing list
>>>>> gpfsug-discuss at gpfsug.org
>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> gpfsug-discuss mailing list
>>>> gpfsug-discuss at gpfsug.org
>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>> 
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at gpfsug.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
>> 
>> 
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Re: [gpfsug-discuss] gpfs client expels

Reply via email to