It is a Dell PERC6 RAID array. OMSA monitoring is enabled and is not throwing errors. Hmmmm, mptctl is old though, so maybe that is a contributing factor. I guess I need to update that. Shoot, megaraid_sas is also not up to date. dkms....
OK, guess I need some driver updates. Later. bob On 12/2/2010 4:05 PM, Colin Faber wrote: > Hi Bob, > > If you're seeing the same errors on the same disk after e2fsck run, > and it's not catching them, it's possible that you're hitting an edge > case which isn't handled within e2fsck properly, however if you're > experiencing different errors and e2fsck did catch them before, > chances are you're looking at some hardware failure some place. > > If this is a single disk, and you have SMART monitoring enabled, check > your error counters, if it's a raid device, verify the error counters > on that. > > -cf > > > On 12/02/2010 02:00 PM, Bob Ball wrote: >> We were getting errors thrown by an OST. /var/log/messages contained a >> lot of these: >> 2010-11-28T17:05:34-05:00 umfs06.aglt2.org kernel: [2102640.735927] >> LDISKFS-fs error (device sdk): ldiskfs_mb_check_ondisk_bitmap: on-disk >> bitmap for group 639corrupted: 440 blocks free in bitmap, 439 - in gd >> >> So, I turned off (most) access to the disk via lctl (we have a LOT of >> client machines, some were missed) and got problems. Had to use the >> alternate superblock to e2fsck the disk. When back online, I still saw >> similar messages. Updated to e2fsprogs 1.41.12 as suggested elsewhere. >> Repeated e2fsck. >> >> Still seeing these. Users report some files corrupted, coming up with >> bad md5sum.... Any other thoughts on what to do about this problem? >> >> [2440763.879143] LDISKFS-fs error (device sdk): >> ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 35406corrupted: >> 1318 blocks free in bitmap, 1317 - in gd >> [2440763.879796] >> [2440763.882724] LustreError: >> 1651027:0:(fsfilt-ldiskfs.c:1333:fsfilt_ldiskfs_write_record()) can't >> read/create block: -28 >> [2440763.882736] LustreError: >> 1651027:0:(llog_lvfs.c:116:llog_lvfs_write_blob()) error writing log >> record: rc -28 >> [2440763.882789] LustreError: >> 1651002:0:(mgc_request.c:1089:mgc_copy_llog()) Failed to copy remote log >> umt3-OST0019 (-28) >> >> Rebooted to make system clean as a whole, and found the same kind of >> thing repeating. >> [ 285.834864] LDISKFS-fs (sdk): warning: mounting fs with errors, >> running e2fsck is recommended >> [ 285.852559] LDISKFS-fs (sdk): mounted filesystem with ordered data >> mode >> [ 286.079065] LDISKFS-fs (sdk): warning: mounting fs with errors, >> running e2fsck is recommended >> [ 286.096316] LDISKFS-fs (sdk): mounted filesystem with ordered data >> mode >> [ 286.940872] LDISKFS-fs error (device sdk): >> ldiskfs_mb_check_ondisk_bitmap: on-disk bitmap for group 35406corrupted: >> 1318 blocks free in bitmap, 1317 - in gd >> [ 286.941693] >> [ 286.945224] LustreError: >> 5790:0:(fsfilt-ldiskfs.c:1333:fsfilt_ldiskfs_write_record()) can't >> read/create block: -28 >> [ 286.945233] LustreError: >> 5790:0:(llog_lvfs.c:116:llog_lvfs_write_blob()) error writing log >> record: rc -28 >> [ 286.945448] LustreError: 5763:0:(mgc_request.c:1089:mgc_copy_llog()) >> Failed to copy remote log umt3-OST0019 (-28) >> >> All help appreciated. >> >> bob >> _______________________________________________ >> Lustre-discuss mailing list >> [email protected] >> http://lists.lustre.org/mailman/listinfo/lustre-discuss > > _______________________________________________ Lustre-discuss mailing list [email protected] http://lists.lustre.org/mailman/listinfo/lustre-discuss
