Lustre experts, We have been using lustre for a few months now to serve a few TB's of data to multiple client computers. Earlier today as I created some new volumes on a Sun StorageTek 6140 FC disk controller, it appears to have resulted in a short outage of some FC connections and resulted in I/O errors on the lustre server (which is actually acting as an MGS, MDS and OSS). We are hoping to move to a better more robust architecture with separate nodes, failover etc in the near future. Having said that, we have been running the current setup (on RHEL 5.2 64bit with Lustre 1.6.5.1) for a few months without issue.
While the initial cause of the I/O errors has passed (a change in the disk configuration exported by the 6140 presumably triggering some kind of SAN outage - which we have involved Sun), we are still getting the error below on an at-least secondly basis since then: lustre1 kernel: LustreError: 9643:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 I have successfully unmounted and remounted the various filesystems, but the errors continue. Clients can still umount/mount and write to the filesystems, although one of the filesystems reports it is read-only but is allowing writes (see below): [r...@sarton srs]# touch anewfile touch: setting times of `anewfile': Read-only file system [r...@sarton srs]# pwd /srs [r...@sarton srs]# [r...@sarton srs]# ls anewfile data1 data2 test www [r...@sarton srs]# rm anewfile rm: remove regular empty file `anewfile'? y [r...@sarton srs]# Firstly, is this 'LustreError' concerning? I couldn't find a hit on google even on various substrings. Secondly, is there an fsck.lustre command that we could/should run following situations where I/O errors are known to have occurred. Thanks in advance for any advice or referrals to where I can find out this information. Regards, Marcus Schull Systems Administrator University of Queensland. ------------------------ Feb 4 12:46:41 lustre1 kernel: sd 1:0:0:13: Device not ready: <6>: Current: sense key: Not Ready Feb 4 12:46:41 lustre1 kernel: Add. Sense: Logical unit not ready, cause not reportable Feb 4 12:46:41 lustre1 kernel: Feb 4 12:46:41 lustre1 kernel: end_request: I/O error, dev sdw, sector 1014121535 Feb 4 12:46:41 lustre1 kernel: device-mapper: multipath: Failing path 65:96. Feb 4 12:46:41 lustre1 multipathd: 65:96: mark as failed Feb 4 12:46:41 lustre1 multipathd: mpath15: remaining active paths: 1 Feb 4 12:46:41 lustre1 kernel: sd 1:0:0:15: Device not ready: <6>: Current: sense key: Not Ready Feb 4 12:46:41 lustre1 kernel: Add. Sense: Logical unit not ready, cause not reportable Feb 4 12:46:41 lustre1 kernel: Feb 4 12:46:41 lustre1 kernel: end_request: I/O error, dev sdaa, sector 1780570175 Feb 4 12:46:41 lustre1 kernel: device-mapper: multipath: Failing path 65:160. Feb 4 12:46:41 lustre1 multipathd: 65:160: mark as failed Feb 4 12:46:41 lustre1 multipathd: mpath12: remaining active paths: 1 Feb 4 12:46:41 lustre1 kernel: sd 1:0:0:15: Device not ready: <6>: Current: sense key: Not Ready Feb 4 12:46:41 lustre1 kernel: Add. Sense: Logical unit not ready, cause not reportable Feb 4 12:46:41 lustre1 kernel: Feb 4 12:46:41 lustre1 kernel: end_request: I/O error, dev sdaa, sector 1780571199 Feb 4 12:46:41 lustre1 kernel: sd 2:0:0:15: Device not ready: <6>: Current: sense key: Not Ready Feb 4 12:46:41 lustre1 kernel: Add. Sense: Logical unit not ready, cause not reportable Feb 4 12:46:41 lustre1 kernel: Feb 4 12:46:41 lustre1 kernel: end_request: I/O error, dev sdai, sector 1780570175 Feb 4 12:46:41 lustre1 kernel: device-mapper: multipath: Failing path 66:32. Feb 4 12:46:41 lustre1 kernel: sd 2:0:0:15: Device not ready: <6>: Current: sense key: Not Ready Feb 4 12:46:41 lustre1 kernel: Add. Sense: Logical unit not ready, cause not reportable Feb 4 12:46:41 lustre1 kernel: Feb 4 12:46:41 lustre1 kernel: end_request: I/O error, dev sdai, sector 1780571199 Feb 4 12:46:41 lustre1 kernel: LustreError: 9715:0:(filter_io.c: 366:filter_preprw_read()) io error -5 Feb 4 12:46:41 lustre1 kernel: LustreError: 9704:0:(filter_io.c: 366:filter_preprw_read()) io error -5 Feb 4 12:46:42 lustre1 kernel: LustreError: 9654:0:(filter_io.c: 366:filter_preprw_read()) io error -5 Feb 4 12:46:42 lustre1 kernel: LustreError: 9654:0:(filter_io.c: 366:filter_preprw_read()) Skipped 1 previous similar message Feb 4 12:46:42 lustre1 kernel: sd 2:0:0:13: Device not ready: <6>: Current: sense key: Not Ready Feb 4 12:46:42 lustre1 kernel: Add. Sense: Logical unit not ready, cause not reportable Feb 4 12:46:42 lustre1 kernel: Feb 4 12:46:42 lustre1 kernel: end_request: I/O error, dev sdag, sector 1015055423 Feb 4 12:46:42 lustre1 kernel: device-mapper: multipath: Failing path 66:0. Feb 4 12:46:42 lustre1 kernel: Buffer I/O error on device dm-27, logical block 35218 Feb 4 12:46:42 lustre1 kernel: lost page write due to I/O error on dm-27 Feb 4 12:46:42 lustre1 kernel: Aborting journal on device dm-27. Feb 4 12:46:42 lustre1 kernel: LustreError: 1071:0:(obd.h: 1117:obd_transno_commit_cb()) qfab-OST0002: transno 861088 commit error: 2 Feb 4 12:46:42 lustre1 kernel: LustreError: 9682:0:(filter_io_26.c: 769:filter_commitrw_write()) Failure to commit OST transaction (-5)? Feb 4 12:46:42 lustre1 multipathd: 66:0: mark as failed Feb 4 12:46:42 lustre1 multipathd: mpath15: remaining active paths: 0 Feb 4 12:46:42 lustre1 multipathd: 66:32: mark as failed Feb 4 12:46:42 lustre1 multipathd: mpath12: remaining active paths: 0 Feb 4 12:46:43 lustre1 kernel: ldiskfs_abort called. Feb 4 12:46:43 lustre1 kernel: LDISKFS-fs error (device dm-27): ldiskfs_journal_start_sb: Detected aborted journal Feb 4 12:46:43 lustre1 kernel: Remounting filesystem read-only Feb 4 12:46:43 lustre1 kernel: LustreError: 9713:0:(fsfilt-ldiskfs.c: 418:fsfilt_ldiskfs_brw_start()) can't get handle for 361 credits: rc = -30 Feb 4 12:46:43 lustre1 kernel: LustreError: 9713:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:46:44 lustre1 multipathd: sdw: tur checker reports path is down Feb 4 12:46:44 lustre1 multipathd: sdaa: tur checker reports path is down Feb 4 12:46:44 lustre1 multipathd: sdag: tur checker reports path is down Feb 4 12:46:44 lustre1 multipathd: sdai: tur checker reports path is down Feb 4 12:46:44 lustre1 kernel: LustreError: 9695:0:(filter_io.c: 366:filter_preprw_read()) io error -5 Feb 4 12:46:44 lustre1 kernel: LustreError: 9695:0:(filter_io.c: 366:filter_preprw_read()) Skipped 4 previous similar messages Feb 4 12:46:44 lustre1 kernel: Buffer I/O error on device dm-26, logical block 45983 Feb 4 12:46:44 lustre1 kernel: lost page write due to I/O error on dm-26 Feb 4 12:46:44 lustre1 kernel: Aborting journal on device dm-26. Feb 4 12:46:44 lustre1 kernel: LustreError: 993:0:(obd.h: 1117:obd_transno_commit_cb()) qfab-OST0004: transno 825433 commit error: 2 Feb 4 12:46:44 lustre1 kernel: journal commit I/O error Feb 4 12:46:44 lustre1 kernel: LustreError: 9711:0:(filter_io_26.c: 769:filter_commitrw_write()) Failure to commit OST transaction (-5)? Feb 4 12:46:44 lustre1 kernel: ldiskfs_abort called. Feb 4 12:46:44 lustre1 kernel: LDISKFS-fs error (device dm-26): ldiskfs_journal_start_sb: Detected aborted journal Feb 4 12:46:44 lustre1 kernel: Remounting filesystem read-only Feb 4 12:46:44 lustre1 kernel: LustreError: 9679:0:(fsfilt-ldiskfs.c: 418:fsfilt_ldiskfs_brw_start()) can't get handle for 409 credits: rc = -30 Feb 4 12:46:44 lustre1 kernel: LustreError: 9679:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:46:44 lustre1 kernel: LustreError: 9766:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:46:45 lustre1 kernel: LustreError: 9705:0:(fsfilt-ldiskfs.c: 418:fsfilt_ldiskfs_brw_start()) can't get handle for 133 credits: rc = -30 Feb 4 12:46:45 lustre1 kernel: LustreError: 9705:0:(fsfilt-ldiskfs.c: 418:fsfilt_ldiskfs_brw_start()) Skipped 1 previous similar message Feb 4 12:46:45 lustre1 kernel: LustreError: 9705:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:46:45 lustre1 kernel: LustreError: 9659:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:46:45 lustre1 kernel: LustreError: 9688:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:46:45 lustre1 kernel: LustreError: 9670:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:46:46 lustre1 kernel: LustreError: 9741:0:(filter_io.c: 366:filter_preprw_read()) io error -5 Feb 4 12:46:46 lustre1 kernel: LustreError: 9741:0:(filter_io.c: 366:filter_preprw_read()) Skipped 1 previous similar message Feb 4 12:46:47 lustre1 kernel: LustreError: 9743:0:(fsfilt-ldiskfs.c: 418:fsfilt_ldiskfs_brw_start()) can't get handle for 133 credits: rc = -30 Feb 4 12:46:47 lustre1 kernel: LustreError: 9743:0:(fsfilt-ldiskfs.c: 418:fsfilt_ldiskfs_brw_start()) Skipped 3 previous similar messages Feb 4 12:46:47 lustre1 kernel: LustreError: 9743:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:46:47 lustre1 kernel: LustreError: 9723:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:46:47 lustre1 kernel: LustreError: 9712:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:46:48 lustre1 kernel: LustreError: 9692:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:46:49 lustre1 multipathd: sdw: tur checker reports path is down Feb 4 12:46:49 lustre1 multipathd: sdaa: tur checker reports path is down Feb 4 12:46:49 lustre1 multipathd: sdag: tur checker reports path is down Feb 4 12:46:49 lustre1 multipathd: sdai: tur checker reports path is down Feb 4 12:46:50 lustre1 kernel: LustreError: 9717:0:(fsfilt-ldiskfs.c: 418:fsfilt_ldiskfs_brw_start()) can't get handle for 133 credits: rc = -30 Feb 4 12:46:50 lustre1 kernel: LustreError: 9717:0:(fsfilt-ldiskfs.c: 418:fsfilt_ldiskfs_brw_start()) Skipped 3 previous similar messages Feb 4 12:46:50 lustre1 kernel: LustreError: 9717:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:46:50 lustre1 kernel: LustreError: 9735:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:46:50 lustre1 kernel: LustreError: 9750:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:46:51 lustre1 kernel: Buffer I/O error on device dm-26, logical block 28 Feb 4 12:46:51 lustre1 kernel: lost page write due to I/O error on dm-26 Feb 4 12:46:51 lustre1 kernel: Buffer I/O error on device dm-26, logical block 29 Feb 4 12:46:51 lustre1 kernel: lost page write due to I/O error on dm-26 Feb 4 12:46:51 lustre1 kernel: Buffer I/O error on device dm-26, logical block 30 Feb 4 12:46:51 lustre1 kernel: lost page write due to I/O error on dm-26 Feb 4 12:46:51 lustre1 kernel: Buffer I/O error on device dm-26, logical block 31 Feb 4 12:46:51 lustre1 kernel: lost page write due to I/O error on dm-26 Feb 4 12:46:51 lustre1 kernel: Buffer I/O error on device dm-26, logical block 56 Feb 4 12:46:51 lustre1 kernel: lost page write due to I/O error on dm-26 Feb 4 12:46:51 lustre1 kernel: Buffer I/O error on device dm-26, logical block 57 Feb 4 12:46:51 lustre1 kernel: lost page write due to I/O error on dm-26 Feb 4 12:46:51 lustre1 kernel: Buffer I/O error on device dm-26, logical block 115965952 Feb 4 12:46:51 lustre1 kernel: lost page write due to I/O error on dm-26 Feb 4 12:46:51 lustre1 kernel: Buffer I/O error on device dm-26, logical block 115965954 Feb 4 12:46:51 lustre1 kernel: lost page write due to I/O error on dm-26 Feb 4 12:46:51 lustre1 kernel: Buffer I/O error on device dm-26, logical block 116981760 Feb 4 12:46:51 lustre1 kernel: lost page write due to I/O error on dm-26 Feb 4 12:46:51 lustre1 kernel: LustreError: 9747:0:(filter_io.c: 366:filter_preprw_read()) io error -5 Feb 4 12:46:51 lustre1 kernel: LustreError: 9747:0:(filter_io.c: 366:filter_preprw_read()) Skipped 16 previous similar messages Feb 4 12:46:52 lustre1 kernel: LustreError: 9725:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:46:54 lustre1 multipathd: sdw: tur checker reports path is up Feb 4 12:46:54 lustre1 multipathd: 65:96: reinstated Feb 4 12:46:54 lustre1 multipathd: mpath15: remaining active paths: 1 Feb 4 12:46:54 lustre1 multipathd: sdaa: tur checker reports path is up Feb 4 12:46:54 lustre1 multipathd: 65:160: reinstated Feb 4 12:46:54 lustre1 multipathd: mpath12: remaining active paths: 1 Feb 4 12:46:54 lustre1 multipathd: sdag: tur checker reports path is up Feb 4 12:46:54 lustre1 multipathd: 66:0: reinstated Feb 4 12:46:54 lustre1 multipathd: mpath15: remaining active paths: 2 Feb 4 12:46:54 lustre1 multipathd: sdai: tur checker reports path is up Feb 4 12:46:54 lustre1 multipathd: 66:32: reinstated Feb 4 12:46:54 lustre1 multipathd: mpath12: remaining active paths: 2 Feb 4 12:46:54 lustre1 kernel: LustreError: 9651:0:(fsfilt-ldiskfs.c: 418:fsfilt_ldiskfs_brw_start()) can't get handle for 133 credits: rc = -30 Feb 4 12:46:54 lustre1 kernel: LustreError: 9651:0:(fsfilt-ldiskfs.c: 418:fsfilt_ldiskfs_brw_start()) Skipped 3 previous similar messages Feb 4 12:46:54 lustre1 kernel: LustreError: 9651:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:46:54 lustre1 kernel: LustreError: 9655:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:46:54 lustre1 kernel: LustreError: 9728:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:46:57 lustre1 kernel: LustreError: 9738:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:46:59 lustre1 kernel: LustreError: 9727:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:46:59 lustre1 kernel: LustreError: 9678:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:46:59 lustre1 kernel: LustreError: 9702:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:47:03 lustre1 kernel: LustreError: 9663:0:(fsfilt-ldiskfs.c: 418:fsfilt_ldiskfs_brw_start()) can't get handle for 361 credits: rc = -30 Feb 4 12:47:03 lustre1 kernel: LustreError: 9663:0:(fsfilt-ldiskfs.c: 418:fsfilt_ldiskfs_brw_start()) Skipped 6 previous similar messages Feb 4 12:47:03 lustre1 kernel: LustreError: 9663:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:47:04 lustre1 kernel: LustreError: 9704:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:47:06 lustre1 kernel: LustreError: 9654:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:47:06 lustre1 kernel: LustreError: 9758:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:47:06 lustre1 kernel: LustreError: 9682:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:47:06 lustre1 kernel: LustreError: 9644:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:47:06 lustre1 kernel: LustreError: 9677:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:47:07 lustre1 kernel: LustreError: 9713:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:47:07 lustre1 kernel: LustreError: 9690:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:47:07 lustre1 kernel: LustreError: 9744:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:47:08 lustre1 kernel: LustreError: 9695:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:47:09 lustre1 kernel: LustreError: 9766:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:47:10 lustre1 kernel: LustreError: 9762:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:47:10 lustre1 kernel: LustreError: 9662:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:47:10 lustre1 kernel: LustreError: 9659:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:47:12 lustre1 kernel: LustreError: 9670:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:47:12 lustre1 kernel: LustreError: 9647:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:47:12 lustre1 kernel: LustreError: 9742:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:47:13 lustre1 kernel: LustreError: 9669:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:47:13 lustre1 kernel: LustreError: 9755:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:47:14 lustre1 kernel: LustreError: 9664:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:47:16 lustre1 kernel: LustreError: 9756:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:47:17 lustre1 kernel: LustreError: 9687:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:47:19 lustre1 kernel: LustreError: 9653:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:47:19 lustre1 kernel: LustreError: 9733:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 12:47:19 lustre1 kernel: LustreError: 9691:0:(fsfilt-ldiskfs.c: 418:fsfilt_ldiskfs_brw_start()) can't get handle for 296 credits: rc = -30 Feb 4 12:47:19 lustre1 kernel: LustreError: 9691:0:(fsfilt-ldiskfs.c: 418:fsfilt_ldiskfs_brw_start()) Skipped 24 previous similar messages Feb 4 12:47:19 lustre1 kernel: LustreError: 9691:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 [r...@lustre1 log]# tail /var/log/messages Feb 4 16:17:27 lustre1 kernel: LustreError: 9723:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 16:17:27 lustre1 kernel: LustreError: 9761:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 16:17:27 lustre1 kernel: LustreError: 9686:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 16:18:14 lustre1 kernel: LustreError: 9713:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 16:18:14 lustre1 kernel: LustreError: 9703:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 16:18:14 lustre1 kernel: LustreError: 9687:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 16:18:32 lustre1 kernel: LustreError: 9744:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 16:18:32 lustre1 kernel: LustreError: 9681:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 16:18:32 lustre1 kernel: LustreError: 9758:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 Feb 4 16:18:32 lustre1 kernel: LustreError: 9642:0:(filter_io_26.c: 707:filter_commitrw_write()) error starting transaction: rc = -30 [r...@lustre1 log]# _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
