This is still happening in 3.5.0.18 and when a snapshot is being deleted it slows NFS read speeds to a crawl (but not gpfs and not NFS writes).
On Thu, May 15, 2014 at 7:48 AM, Sabuj Pattanayek <[email protected]> wrote: > Hi all, > > We're running 3.5.0.17 now and it looks like the filesystem manager > automatically reboots (and sometimes fails to automatically reboot) after > mmdelsnapshot is called, either from the filesystem manager itself or from > some other nsd/node . It didn't start happening immediately after we > updated to 17, but we never had this issue when we were at 3.5.0.11 . The > error mmdelsnapshot throws at some point is : > > Lost connection to file system daemon. > mmdelsnapshot: An internode connection between GPFS nodes was disrupted. > mmdelsnapshot: Command failed. Examine previous error messages to > determine cause. > > It also causes an mmfs generic error and or a kernel: BUG: soft lockup - > CPU#15 stuck for 67s! [mmfsd:39266], the latter causes the system to not > reboot itself (which is actually worse), but the former does. > > > It also causes some havoc with CNFS file locking even after the filesystem > manager is rebooted and has come up : > > > May 15 07:10:12 mako-nsd1 sm-notify[19387]: Failed to bind RPC socket: > Address already in use > > > May 15 07:21:03 mako-nsd1 sm-notify[11052]: Invalid bind address or port > > for RPC socket: Name or service not known > > > Saw some snapshot related fixes in 3.5.0.18, anyone seen this behavior or > know if it's fixed in 18? > > > Thanks, > > Sabuj > > >
_______________________________________________ gpfsug-discuss mailing list gpfsug-discuss at gpfsug.org http://gpfsug.org/mailman/listinfo/gpfsug-discuss
