Hello Lustre gurus, Recently, one of our OSS' had a faulty RAID card (3ware) and this has corrupted the root filesystem and Lustre OST.
We then reinstalled the OS, fsck'd Lustre OST using a backup superblock (the primary one was corrupted) and recreated the journal (journal also corrupted). We now have a bunch of files in lost+found, evidently by mounting as ldiskfs. However, we are having problems mounting the Lustre OST with errors as follows: Oct 7 13:01:45 OSS50 kernel: LDISKFS-fs (sdb): mounted filesystem with ordered data mode. quota=off. Opts: Oct 7 13:01:48 OSS50 kernel: LustreError: 137-5: Lustre-OST003b_UUID: not available for connect from 172.16.4.66@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. Oct 7 13:01:48 OSS50 kernel: LustreError: Skipped 5 previous similar messages Oct 7 13:01:48 OSS50 kernel: LustreError: 137-5: Lustre-OST003b_UUID: not available for connect from 172.16.250.59@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. Oct 7 13:01:48 OSS50 kernel: LustreError: Skipped 3 previous similar messages Oct 7 13:01:51 OSS50 kernel: LustreError: 137-5: Lustre-OST003b_UUID: not available for connect from 172.16.7.199@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. Oct 7 13:01:51 OSS50 kernel: LustreError: Skipped 15 previous similar messages Oct 7 13:01:55 OSS50 kernel: LustreError: 137-5: Lustre-OST003b_UUID: not available for connect from 172.16.250.173@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. Oct 7 13:01:55 OSS50 kernel: LustreError: Skipped 19 previous similar messages Oct 7 13:02:04 OSS50 kernel: LustreError: 137-5: Lustre-OST003b_UUID: not available for connect from 172.16.5.114@tcp (no target). If you are running an HA pair check that the target is mounted on the other server. Oct 7 13:02:04 OSS50 kernel: LustreError: Skipped 49 previous similar messages Oct 7 13:02:04 OSS50 kernel: LustreError: 0-0: Trying to start OBD Lustre-OST003b_UUID using the wrong disk <85>. Were the /dev/ assignments rearranged? Oct 7 13:02:04 OSS50 kernel: LustreError: 16002:0:(obd_config.c:572:class_setup()) setup Lustre-OST003b failed (-22) Oct 7 13:02:04 OSS50 kernel: LustreError: 16002:0:(obd_config.c:1591:class_config_llog_handler()) MGC172.16.0.251@tcp: cfg command failed: rc = -22 Oct 7 13:02:04 OSS50 kernel: Lustre: cmd=cf003 0:Lustre-OST003b 1:dev 2:0 3:f Oct 7 13:02:04 OSS50 kernel: LustreError: 15b-f: MGC172.16.0.251@tcp: The configuration from log 'Lustre-OST003b'failed from the MGS (-22). Make sure this client and the MGS are running compatible versions of Lustre. Oct 7 13:02:05 OSS50 kernel: LustreError: 15c-8: MGC172.16.0.251@tcp: The configuration from log 'Lustre-OST003b' failed (-22). This may be the result of communication errors between this node and the MGS, a bad configuration, or other errors. See the syslog for more information. Oct 7 13:02:05 OSS50 kernel: LustreError: 15976:0:(obd_mount_server.c:1252:server_start_targets()) failed to start server Lustre-OST003b: -22 Oct 7 13:02:05 OSS50 kernel: LustreError: 15976:0:(obd_mount_server.c:1735:server_fill_super()) Unable to start targets: -22 Oct 7 13:02:05 OSS50 kernel: Lustre: Lustre-OST003b: Not available for connect from 172.16.5.116@tcp (not set up) Oct 7 13:02:05 OSS50 kernel: LustreError: 15976:0:(obd_mount_server.c:845:lustre_disconnect_lwp()) Lustre-MDT0000-lwp-OST003b: Can't end config log Lustre-client. Oct 7 13:02:05 OSS50 kernel: LustreError: 15976:0:(obd_mount_server.c:1420:server_put_super()) Lustre-OST003b: failed to disconnect lwp. (rc=-2) Oct 7 13:02:05 OSS50 kernel: LustreError: 15976:0:(obd_config.c:619:class_cleanup()) Device 135 not setup Oct 7 13:02:05 OSS50 kernel: Lustre: server umount Lustre-OST003b complete Oct 7 13:02:05 OSS50 kernel: LustreError: 15976:0:(obd_mount.c:1324:lustre_fill_super()) Unable to mount /dev/sdb (-22) Oct 7 13:02:05 OSS50 kernel: Lustre: Skipped 1 previous similar message Any ideas? I would think that we can eliminate the configuration errors by doing a writeconf but since this is a potentially destructive operation, I'd like to check with you experts see if anyone have experienced something like this? Thank you, Murshid.
_______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org