On 14-Oct-09, at 01:08, Michael Schwartzkopff wrote: > we have a Lustre 1.8 Cluster with openais and pacemaker as the cluster > manager. When I migrate one lustre resource from one node to an > other node I > get an error. Stopping lustre on one node is no problem, but the > node where > lustre should start says: > > Oct 14 09:54:28 sososd6 kernel: kjournald starting. Commit interval > 5 seconds > Oct 14 09:54:28 sososd6 kernel: LDISKFS FS on dm-4, internal journal > Oct 14 09:54:28 sososd6 kernel: LDISKFS-fs: recovery complete. > Oct 14 09:54:28 sososd6 kernel: LDISKFS-fs: mounted filesystem with > ordered > data mode. > Oct 14 09:54:28 sososd6 multipathd: dm-4: umount map (uevent) > Oct 14 09:54:39 sososd6 kernel: kjournald starting. Commit interval > 5 seconds > Oct 14 09:54:39 sososd6 kernel: LDISKFS FS on dm-4, internal journal > Oct 14 09:54:39 sososd6 kernel: LDISKFS-fs: mounted filesystem with > ordered > data mode. > Oct 14 09:54:39 sososd6 kernel: LDISKFS-fs: file extents enabled > Oct 14 09:54:39 sososd6 kernel: LDISKFS-fs: mballoc enabled > Oct 14 09:54:39 sososd6 kernel: Lustre: mgc134.171.16....@tcp: > Reactivating > import > Oct 14 09:54:45 sososd6 kernel: LustreError: 137-5: UUID 'segfs- > OST0000_UUID' > is not available for connect (no target)
This is likely driven by some client trying to connect to OST0000, but I don't see anything in the above logs that indicate that OST0000 has actually started up yet. It should have something like: RECOVERY: service myth-OST0000, 3 recoverable clients, last_rcvd 17180097556 Lustre: OST myth-OST0000 now serving dev (myth-OST0000/81a23803-0711- a534-441a-f5ee34e094a8), but will be in recovery for at least 5:00, or until 3 clients reconnect. Lustre: Server myth-OST0000 on device /dev/mapper/vgmyth-lvmythost0 has started > These log continue until the cluster software times out and the > resource tells > me about the error. Any help understanding these logs? Thanks. Are you sure you are mounting the OSTs with type "lustre" instead of "ldiskfs"? I see the above Lustre messages on my system a few seconds after the LDISKFS messages are printed. If you are using MMP (which you should be, on an automated failover config) it will add 10-20s of delay to the ldiskfs mount. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
