On my systems that use standard ethernet (im in the cloud), 2.9 reboots I
have no issues I can see. I did have issues with the lnet driver not being
able to grab the port on boot-up so I backported the lnet systemd unit file
from 2.10 to get around that.
On Thu, Aug 10, 2017 at 9:44 AM, Ben Evans <bev...@cray.com> wrote:
> Are the Infiniband drivers disappearing first? I know that used to be an
> On 8/10/17, 8:59 AM, "lustre-discuss on behalf of Michael Di Domenico"
> <lustre-discuss-boun...@lists.lustre.org on behalf of
> mdidomeni...@gmail.com> wrote:
> >does anyone else have issues with issue 'reboot' while having a lustre
> >we're running v2.9 clients on our workstations, but when a user goes
> >to reboot the machine (from the gui) the system stalls under systemd
> >while i presume it's attempting to unmount the filesystem.
> >what i see on the console is; systemd kicks in and starts unmounting
> >all the nfs shares we have, works fine. but then it gets to lustre
> >and starts throwing connection errors on the console. it's almost as
> >if systemd raced itself stopping lustre, whereby lnet got yanked out
> >from under the mount before the unmount actually finished.
> >after five minutes or so, it looks like systemd threw in the towel and
> >gave up trying to unmount, but the system is stuck still trying to
> >execute more shutdown tasks.
> >when we mount lustre on the workstations, i have a script that figures
> >some stuff out, issues a service lnet start, and then issues a mount
> >command. this all works fine, but i'm not sure if that's why systemd
> >can't figure out what to do correctly.
> >and since this is during a shutdown phase, debugging this is
> >difficult. any thoughts?
> >lustre-discuss mailing list
> lustre-discuss mailing list
lustre-discuss mailing list