This is still happening in 3.5.0.18 and when a snapshot is being deleted it
slows NFS read speeds to a crawl (but not gpfs and not NFS writes).


On Thu, May 15, 2014 at 7:48 AM, Sabuj Pattanayek <[email protected]> wrote:

> Hi all,
>
> We're running 3.5.0.17 now and it looks like the filesystem manager
> automatically reboots (and sometimes fails to automatically reboot) after
> mmdelsnapshot is called, either from the filesystem manager itself or from
> some other nsd/node . It didn't start happening immediately after we
> updated to 17, but we never had this issue when we were at 3.5.0.11 . The
> error mmdelsnapshot throws at some point is :
>
> Lost connection to file system daemon.
> mmdelsnapshot: An internode connection between GPFS nodes was disrupted.
> mmdelsnapshot: Command failed.  Examine previous error messages to
> determine cause.
>
> It also causes an mmfs generic error and or a kernel: BUG: soft lockup - 
> CPU#15 stuck for 67s! [mmfsd:39266], the latter causes the system to not 
> reboot itself (which is actually worse), but the former does.
>
>
> It also causes some havoc with CNFS file locking even after the filesystem 
> manager is rebooted and has come up :
>
>
> May 15 07:10:12 mako-nsd1 sm-notify[19387]: Failed to bind RPC socket:
> Address already in use
>
>
> May 15 07:21:03 mako-nsd1 sm-notify[11052]: Invalid bind address or port
>
> for RPC socket: Name or service not known
>
>
> Saw some snapshot related fixes in 3.5.0.18, anyone seen this behavior or 
> know if it's fixed in 18?
>
>
> Thanks,
>
> Sabuj
>
>
>
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

Reply via email to