> On 2017 Apr 26 Wed, at 16:20, Simon Thompson (IT Research Support) > <s.j.thomp...@bham.ac.uk> wrote: > > Nope, the clients are all L3 connected, so not an arp issue.
...not on the client, but the server-facing L3 switch still need to manage its ARP table, and might miss the IP moving to a new MAC. Cisco switches have a default ARP cache timeout of 4 hours, fwiw. Can your network team provide you the ARP status from the switch when you see a fail-over being stuck? — Peter > > Two things we have observed: > > 1. It triggers when one of the CES IPs moves and quickly moves back again. > The move occurs because the NFS server goes into grace: > > 2017-04-25 20:36:49 : epoch 00040183 : <NODENAME> : > ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server Now IN > GRACE, duration 60 > 2017-04-25 20:36:49 : epoch 00040183 : <NODENAME> : > ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server > recovery event 2 nodeid -1 ip <CESIP> > 2017-04-25 20:36:49 : epoch 00040183 : <NODENAME> : > ganesha.nfsd-1261[dbus] nfs_release_v4_client :STATE :EVENT :NFS Server V4 > recovery release ip <CESIP> > 2017-04-25 20:36:49 : epoch 00040183 : <NODENAME> : > ganesha.nfsd-1261[dbus] nfs_in_grace :STATE :EVENT :NFS Server Now IN GRACE > 2017-04-25 20:37:42 : epoch 00040183 : <NODENAME> : > ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server Now IN > GRACE, duration 60 > 2017-04-25 20:37:44 : epoch 00040183 : <NODENAME> : > ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server Now IN > GRACE, duration 60 > 2017-04-25 20:37:44 : epoch 00040183 : <NODENAME> : > ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server > recovery event 4 nodeid 2 ip > > > > We can't see in any of the logs WHY ganesha is going into grace. Any > suggestions on how to debug this further? (I.e. If we can stop the grace > issues, we can solve the problem mostly). > > > 2. Our clients are using LDAP which is bound to the CES IPs. If we > shutdown nslcd on the client we can get the client to recover once all the > TIME_WAIT connections have gone. Maybe this was a bad choice on our side > to bind to the CES IPs - we figured it would handily move the IPs for us, > but I guess the mmcesfuncs isn't aware of this and so doesn't kill the > connections to the IP as it goes away. > > > So two approaches we are going to try. Reconfigure the nslcd on a couple > of clients and see if they still show up the issues when fail-over occurs. > Second is to work out why the NFS servers are going into grace in the > first place. > > Simon > > On 26/04/2017, 00:46, "gpfsug-discuss-boun...@spectrumscale.org on behalf > of greg.lehm...@csiro.au" <gpfsug-discuss-boun...@spectrumscale.org on > behalf of greg.lehm...@csiro.au> wrote: > >> Are you using infiniband or Ethernet? I'm wondering if IBM have solved >> the gratuitous arp issue which we see with our non-protocols NFS >> implementation. >> >> -----Original Message----- >> From: gpfsug-discuss-boun...@spectrumscale.org >> [mailto:gpfsug-discuss-boun...@spectrumscale.org] On Behalf Of Simon >> Thompson (IT Research Support) >> Sent: Wednesday, 26 April 2017 3:31 AM >> To: gpfsug main discussion list <gpfsug-discuss@spectrumscale.org> >> Subject: Re: [gpfsug-discuss] NFS issues >> >> I did some digging in the mmcesfuncs to see what happens server side on >> fail over. >> >> Basically the server losing the IP is supposed to terminate all sessions >> and the receiver server sends ACK tickles. >> >> My current supposition is that for whatever reason, the losing server >> isn't releasing something and the client still has hold of a connection >> which is mostly dead. The tickle then fails to the client from the new >> server. >> >> This would explain why failing the IP back to the original server usually >> brings the client back to life. >> >> This is only my working theory at the moment as we can't reliably >> reproduce this. Next time it happens we plan to grab some netstat from >> each side. >> >> Then we plan to issue "mmcmi tcpack $cesIpPort $clientIpPort" on the >> server that received the IP and see if that fixes it (i.e. the receiver >> server didn't tickle properly). (Usage extracted from mmcesfuncs which is >> ksh of course). ... CesIPPort is colon separated IP:portnumber (of NFSd) >> for anyone interested. >> >> Then try and kill he sessions on the losing server to check if there is >> stuff still open and re-tickle the client. >> >> If we can get steps to workaround, I'll log a PMR. I suppose I could do >> that now, but given its non deterministic and we want to be 100% sure >> it's not us doing something wrong, I'm inclined to wait until we do some >> more testing. >> >> I agree with the suggestion that it's probably IO pending nodes that are >> affected, but don't have any data to back that up yet. We did try with a >> read workload on a client, but may we need either long IO blocked reads >> or writes (from the GPFS end). >> >> We also originally had soft as the default option, but saw issues then >> and the docs suggested hard, so we switched and also enabled sync (we >> figured maybe it was NFS client with uncommited writes), but neither have >> resolved the issues entirely. Difficult for me to say if they improved >> the issue though given its sporadic. >> >> Appreciate people's suggestions! >> >> Thanks >> >> Simon >> ________________________________________ >> From: gpfsug-discuss-boun...@spectrumscale.org >> [gpfsug-discuss-boun...@spectrumscale.org] on behalf of Jan-Frode >> Myklebust [janfr...@tanso.net] >> Sent: 25 April 2017 18:04 >> To: gpfsug main discussion list >> Subject: Re: [gpfsug-discuss] NFS issues >> >> I *think* I've seen this, and that we then had open TCP connection from >> client to NFS server according to netstat, but these connections were not >> visible from netstat on NFS-server side. >> >> Unfortunately I don't remember what the fix was.. >> >> >> >> -jf >> >> tir. 25. apr. 2017 kl. 16.06 skrev Simon Thompson (IT Research Support) >> <s.j.thomp...@bham.ac.uk<mailto:s.j.thomp...@bham.ac.uk>>: >> Hi, >> >> From what I can see, Ganesha uses the Export_Id option in the config file >> (which is managed by CES) for this. I did find some reference in the >> Ganesha devs list that if its not set, then it would read the FSID from >> the GPFS file-system, either way they should surely be consistent across >> all the nodes. The posts I found were from someone with an IBM email >> address, so I guess someone in the IBM teams. >> >> I checked a couple of my protocol nodes and they use the same Export_Id >> consistently, though I guess that might not be the same as the FSID value. >> >> Perhaps someone from IBM could comment on if FSID is likely to the cause >> of my problems? >> >> Thanks >> >> Simon >> >> On 25/04/2017, 14:51, >> "gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-bounces@sp >> ectrumscale.org> on behalf of Ouwehand, JJ" >> <gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-bounces@sp >> ectrumscale.org> on behalf of >> j.ouweh...@vumc.nl<mailto:j.ouweh...@vumc.nl>> wrote: >> >>> Hello, >>> >>> At first a short introduction. My name is Jaap Jan Ouwehand, I work at >>> a Dutch hospital "VU Medical Center" in Amsterdam. We make daily use of >>> IBM Spectrum Scale, Spectrum Archive and Spectrum Protect in our >>> critical (office, research and clinical data) business process. We have >>> three large GPFS filesystems for different purposes. >>> >>> We also had such a situation with cNFS. A failover (IPtakeover) was >>> technically good, only clients experienced "stale filehandles". We >>> opened a PMR at IBM and after testing, deliver logs, tcpdumps and a few >>> months later, the solution appeared to be in the fsid option. >>> >>> An NFS filehandle is built by a combination of fsid and a hash function >>> on the inode. After a failover, the fsid value can be different and the >>> client has a "stale filehandle". To avoid this, the fsid value can be >>> statically specified. See: >>> >>> https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum >>> . >>> scale.v4r22.doc/bl1adm_nfslin.htm >>> >>> Maybe there is also a value in Ganesha that changes after a failover. >>> Certainly since most sessions will be re-established after a failback. >>> Maybe you see more debug information with tcpdump. >>> >>> >>> Kind regards, >>> >>> Jaap Jan Ouwehand >>> ICT Specialist (Storage & Linux) >>> VUmc - ICT >>> E: jj.ouweh...@vumc.nl<mailto:jj.ouweh...@vumc.nl> >>> W: www.vumc.com<http://www.vumc.com> >>> >>> >>> >>> -----Oorspronkelijk bericht----- >>> Van: >>> gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss-bounces@ >>> spectrumscale.org> >>> [mailto:gpfsug-discuss-boun...@spectrumscale.org<mailto:gpfsug-discuss- >>> boun...@spectrumscale.org>] Namens Simon Thompson (IT Research Support) >>> Verzonden: dinsdag 25 april 2017 13:21 >>> Aan: >>> gpfsug-discuss@spectrumscale.org<mailto:gpfsug-disc...@spectrumscale.or >>> g> >>> Onderwerp: [gpfsug-discuss] NFS issues >>> >>> Hi, >>> >>> We have recently started deploying NFS in addition our existing SMB >>> exports on our protocol nodes. >>> >>> We use a RR DNS name that points to 4 VIPs for SMB services and >>> failover seems to work fine with SMB clients. We figured we could use >>> the same name and IPs and run Ganesha on the protocol servers, however >>> we are seeing issues with NFS clients when IP failover occurs. >>> >>> In normal operation on a client, we might see several mounts from >>> different IPs obviously due to the way the DNS RR is working, but it >>> all works fine. >>> >>> In a failover situation, the IP will move to another node and some >>> clients will carry on, others will hang IO to the mount points referred >>> to by the IP which has moved. We can *sometimes* trigger this by >>> manually suspending a CES node, but not always and some clients >>> mounting from the IP moving will be fine, others won't. >>> >>> If we resume a node an it fails back, the clients that are hanging will >>> usually recover fine. We can reboot a client prior to failback and it >>> will be fine, stopping and starting the ganesha service on a protocol >>> node will also sometimes resolve the issues. >>> >>> So, has anyone seen this sort of issue and any suggestions for how we >>> could either debug more or workaround? >>> >>> We are currently running the packages >>> nfs-ganesha-2.3.2-0.ibm32_1.el7.x86_64 (4.2.2-2 release ones). >>> >>> At one point we were seeing it a lot, and could track it back to an >>> underlying GPFS network issue that was causing protocol nodes to be >>> expelled occasionally, we resolved that and the issues became less >>> apparent, but maybe we just fixed one failure mode so see it less often. >>> >>> On the clients, we use -o sync,hard BTW as in the IBM docs. >>> >>> On a client showing the issues, we'll see in dmesg, NFS related >>> messages >>> like: >>> [Wed Apr 12 16:59:53 2017] nfs: server >>> MYNFSSERVER.bham.ac.uk<http://MYNFSSERVER.bham.ac.uk> not responding, >>> timed out >>> >>> Which explains the client hang on certain mount points. >>> >>> The symptoms feel very much like those logged in this Gluster/ganesha >>> bug: >>> https://bugzilla.redhat.com/show_bug.cgi?id=1354439 >>> >>> >>> Thanks >>> >>> Simon >>> >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org<http://spectrumscale.org> >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >>> _______________________________________________ >>> gpfsug-discuss mailing list >>> gpfsug-discuss at spectrumscale.org<http://spectrumscale.org> >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org<http://spectrumscale.org> >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss >> _______________________________________________ >> gpfsug-discuss mailing list >> gpfsug-discuss at spectrumscale.org >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss > > _______________________________________________ > gpfsug-discuss mailing list > gpfsug-discuss at spectrumscale.org > http://gpfsug.org/mailman/listinfo/gpfsug-discuss _______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss