Hi Bob, Good to hear you've identified and resolved the issue. Sorry to hear you'll have to restore from backup though.
-cf On 12/03/2010 02:41 PM, Bob Ball wrote: > Just to cleanly end this thread, the mptctl was out of date. We also > updated megaraid_sas and perc6 firmware. e2fsck found some Block bitmap > differences (fixed) at this point, but the OST mounted cleanly and the > errors stopped. > > Unfortunately, there are now corrupted files in the system, that remain > corrupted, and we'll probably never be able to come up with a complete > list of them. > > bob > > > On 12/2/2010 4:35 PM, Bob Ball wrote: >> It is a Dell PERC6 RAID array. OMSA monitoring is enabled and is not >> throwing errors. Hmmmm, mptctl is old though, so maybe that is a >> contributing factor. I guess I need to update that. Shoot, >> megaraid_sas is also not up to date. dkms.... >> >> OK, guess I need some driver updates. >> >> Later. >> bob >> >> On 12/2/2010 4:05 PM, Colin Faber wrote: >>> Hi Bob, >>> >>> If you're seeing the same errors on the same disk after e2fsck run, >>> and it's not catching them, it's possible that you're hitting an edge >>> case which isn't handled within e2fsck properly, however if you're >>> experiencing different errors and e2fsck did catch them before, >>> chances are you're looking at some hardware failure some place. >>> >>> If this is a single disk, and you have SMART monitoring enabled, check >>> your error counters, if it's a raid device, verify the error counters >>> on that. >>> >>> -cf >>> >>> >>> On 12/02/2010 02:00 PM, Bob Ball wrote: >>>> We were getting errors thrown by an OST. /var/log/messages contained a >>>> lot of these: >>>> 2010-11-28T17:05:34-05:00 umfs06.aglt2.org kernel: [2102640.735927] >>>> LDISKFS-fs error (device sdk): ldiskfs_mb_check_ondisk_bitmap: on-disk >>>> bitmap for group 639corrupted: 440 blocks free in bitmap, 439 - in gd >>>> >>>> So, I turned off (most) access to the disk via lctl (we have a LOT of >>>> client machines, some were missed) and got problems. Had to use the >>>> alternate superblock to e2fsck the disk. When back online, I still saw >>>> similar messages. Updated to e2fsprogs 1.41.12 as suggested elsewhere. >>>> Repeated e2fsck. >>>> >>>> Still seeing these. Users report some files corrupted, coming up with >>>> bad md5sum.... Any other thoughts on what to do about this problem? >>>> >>>> [2440763.879143] LDISKFS-fs error (device sdk): >>>> ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 35406corrupted: >>>> 1318 blocks free in bitmap, 1317 - in gd >>>> [2440763.879796] >>>> [2440763.882724] LustreError: >>>> 1651027:0:(fsfilt-ldiskfs.c:1333:fsfilt_ldiskfs_write_record()) can't >>>> read/create block: -28 >>>> [2440763.882736] LustreError: >>>> 1651027:0:(llog_lvfs.c:116:llog_lvfs_write_blob()) error writing log >>>> record: rc -28 >>>> [2440763.882789] LustreError: >>>> 1651002:0:(mgc_request.c:1089:mgc_copy_llog()) Failed to copy remote log >>>> umt3-OST0019 (-28) >>>> >>>> Rebooted to make system clean as a whole, and found the same kind of >>>> thing repeating. >>>> [ 285.834864] LDISKFS-fs (sdk): warning: mounting fs with errors, >>>> running e2fsck is recommended >>>> [ 285.852559] LDISKFS-fs (sdk): mounted filesystem with ordered data >>>> mode >>>> [ 286.079065] LDISKFS-fs (sdk): warning: mounting fs with errors, >>>> running e2fsck is recommended >>>> [ 286.096316] LDISKFS-fs (sdk): mounted filesystem with ordered data >>>> mode >>>> [ 286.940872] LDISKFS-fs error (device sdk): >>>> ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 35406corrupted: >>>> 1318 blocks free in bitmap, 1317 - in gd >>>> [ 286.941693] >>>> [ 286.945224] LustreError: >>>> 5790:0:(fsfilt-ldiskfs.c:1333:fsfilt_ldiskfs_write_record()) can't >>>> read/create block: -28 >>>> [ 286.945233] LustreError: >>>> 5790:0:(llog_lvfs.c:116:llog_lvfs_write_blob()) error writing log >>>> record: rc -28 >>>> [ 286.945448] LustreError: 5763:0:(mgc_request.c:1089:mgc_copy_llog()) >>>> Failed to copy remote log umt3-OST0019 (-28) >>>> >>>> All help appreciated. >>>> >>>> bob >>>> _______________________________________________ >>>> Lustre-discuss mailing list >>>> [email protected] >>>> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> _______________________________________________ >> Lustre-discuss mailing list >> [email protected] >> http://lists.lustre.org/mailman/listinfo/lustre-discuss >> >> > _______________________________________________ > Lustre-discuss mailing list > [email protected] > http://lists.lustre.org/mailman/listinfo/lustre-discuss _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
