When we started using GPFS, 3.3 time frame, we had a lot of issues with running different meta-applications at the same time.. snapshots, mmapplypolicy, mmdelsnapshot, etc. So we ended up using a locking mechanism around all of these to ensure that they were the only thing running at a given time. That mostly eliminated lock-ups, which were unfortunately common before then. I haven't tried removing it since.
From: [email protected] [mailto:[email protected]] On Behalf Of Howard, Stewart Jameson Sent: Monday, December 07, 2015 12:24 PM To: [email protected] Subject: Re: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS Re-exporting Hi All, Thanks to Doug and Kevin for the replies. In answer to Kevin's question about our choice of clustering solution for NFS: the choice was made hoping to maintain some simplicity by not using more than one HA solution at a time. However, it seems that this choice might have introduced more wrinkles than it's ironed out. An update on our situation: we have actually uncovered another clue since my last posting. One thing that this now known to be correlated *very* closely with instability in the NFS layer is running `mmcrsnapshot`. We had noticed that flapping happened like clockwork at midnight every night. This happens to be the same time at which our crontab was running the `mmcrsnapshot` so, as an experiment, we moved the snapshot to happen at 1a. After this change, the late-night flapping has moved to 1a and now happens reliably every night at that time. I saw a post on this list from 2013 stating that `mmcrsnapshot` was known to hang up the filesystem with race conditions that result in deadlocks and am wondering if that is still a problem with the `mmcrsnapthost` command. Running the snapshots had not been an obvious problem before, but seems to have become one since we deployed ~300 additional GPFS clients in a remote cluster configuration about a week ago. Can anybody comment on the safety of running `mmcrsnapshot` with a ~300 node remote cluster accessing the filesystem? Also, I would comment that this is not the only condition under which we see instability in the NFS layer. We continue to see intermittent instability through the day. The creation of a snapshot is simply the one well-correlated condition that we've discovered so far. Thanks so much to everyone for your help :) Stewart
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at spectrumscale.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
