When we started using GPFS, 3.3 time frame, we had a lot of issues with running 
different meta-applications at the same time.. snapshots, mmapplypolicy, 
mmdelsnapshot, etc. So we ended up using a locking mechanism around all of 
these to ensure that they were the only thing running at a given time. That 
mostly eliminated lock-ups, which were unfortunately common before then. I 
haven't tried removing it since.


From: [email protected] 
[mailto:[email protected]] On Behalf Of Howard, Stewart 
Jameson
Sent: Monday, December 07, 2015 12:24 PM
To: [email protected]
Subject: Re: [gpfsug-discuss] GPFS Remote Cluster Co-existence with CTDB/NFS 
Re-exporting


Hi All,



Thanks to Doug and Kevin for the replies.  In answer to Kevin's question about 
our choice of clustering solution for NFS:  the choice was made hoping to 
maintain some simplicity by not using more than one HA solution at a time.  
However, it seems that this choice might have introduced more wrinkles than 
it's ironed out.



An update on our situation:  we have actually uncovered another clue since my 
last posting.  One thing that this now known to be correlated *very* closely 
with instability in the NFS layer is running `mmcrsnapshot`.    We had noticed 
that flapping happened like clockwork at midnight every night.  This happens to 
be the same time at which our crontab was running the `mmcrsnapshot` so, as an 
experiment, we moved the snapshot to happen at 1a.



After this change, the late-night flapping has moved to 1a and now happens 
reliably every night at that time.  I saw a post on this list from 2013 stating 
that `mmcrsnapshot` was known to hang up the filesystem with race conditions 
that result in deadlocks and am wondering if that is still a problem with the 
`mmcrsnapthost` command.  Running the snapshots had not been an obvious problem 
before, but seems to have become one since we deployed ~300 additional GPFS 
clients in a remote cluster configuration about a week ago.



Can anybody comment on the safety of running `mmcrsnapshot` with a ~300 node 
remote cluster accessing the filesystem?



Also, I would comment that this is not the only condition under which we see 
instability in the NFS layer.  We continue to see intermittent instability 
through the day.  The creation of a snapshot is simply the one well-correlated 
condition that we've discovered so far.



Thanks so much to everyone for your help  :)



Stewart
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to