Hi,A reboot of the MDS stalled and got forced reset. After that the MDS would not start. The syslog is attached.
I'm not sure what the "class_register_device()) astro-OST0002-osc-MDT0000" part is supposed to do but astro-OST0002 is not mounted at this time. I guess this comes from the MGS.
Cheers, Hans Henrik
Mar 10 10:03:49 mds02 kernel: Lustre: MGS: Connection restored to d8787407-db0d-ccfb-e5ab-adeb41b86c1d (at 0@lo) Mar 10 10:03:49 mds02 kernel: Lustre: Skipped 197 previous similar messages Mar 10 10:03:59 mds02 kernel: LustreError: 137-5: astro-MDT0000_UUID: not available for connect from 10.21.207.78@o2ib (no target). If you are running an HA pair check that the target is mounted on the other server. Mar 10 10:03:59 mds02 kernel: LustreError: Skipped 155 previous similar messages Mar 10 10:04:00 mds02 kernel: LustreError: 8923:0:(genops.c:556:class_register_device()) astro-OST0002-osc-MDT0000: already exists, won't add Mar 10 10:04:00 mds02 kernel: LustreError: 8923:0:(obd_config.c:1835:class_config_llog_handler()) MGC10.21.10.102@o2ib: cfg command failed: rc = -17 Mar 10 10:04:00 mds02 kernel: Lustre: cmd=cf001 0:astro-OST0002-osc-MDT0000 1:osp 2:astro-MDT0000-mdtlov_UUID Mar 10 10:04:00 mds02 kernel: LustreError: 15c-8: MGC10.21.10.102@o2ib: The configuration from log 'astro-MDT0000' failed (-17). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. Mar 10 10:04:00 mds02 kernel: LustreError: 7016:0:(obd_mount_server.c:1397:server_start_targets()) failed to start server astro-MDT0000: -17 Mar 10 10:04:00 mds02 kernel: LustreError: 7016:0:(obd_mount_server.c:1992:server_fill_super()) Unable to start targets: -17 Mar 10 10:04:00 mds02 kernel: Lustre: Failing over astro-MDT0000 Mar 10 10:04:01 mds02 kernel: Lustre: astro-MDT0000: Not available for connect from 10.21.208.26@o2ib (stopping) Mar 10 10:04:01 mds02 kernel: Lustre: Skipped 129 previous similar messages Mar 10 10:04:15 mds02 kernel: LustreError: 137-5: astro-MDT0000_UUID: not available for connect from 172.20.2.101@tcp1 (no target). If you are running an HA pair check that the target is mounted on the other server. Mar 10 10:04:15 mds02 kernel: LustreError: 137-5: astro-MDT0000_UUID: not available for connect from 172.20.2.101@tcp1 (no target). If you are running an HA pair check that the target is mounted on the other server. Mar 10 10:04:15 mds02 kernel: LustreError: Skipped 35 previous similar messages Mar 10 10:04:15 mds02 kernel: LustreError: Skipped 1 previous similar message Mar 10 10:04:20 mds02 kernel: Lustre: server umount astro-MDT0000 complete Mar 10 10:04:20 mds02 kernel: LustreError: 7016:0:(obd_mount.c:1608:lustre_fill_super()) Unable to mount (-17) Mar 10 10:04:37 mds02 kernel: Lustre: MGS: Connection restored to (at 10.21.207.58@o2ib)
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
