On Fri, 22 May 2026 09:53:17 -0600
Charles Curley <[email protected]> wrote:
> To be thorough, I have run extended SMART tests on the hard drives,
> kicked mdadm into testing the RAID array, and fscked the LVM
> partitions on the RAID array. Only fsck turned up issues, and that
> has not stopped.
Some additional testing.
Suspecting a bad hard drive, I ran more extended tests on all four
members of the RAID array. One showed problems:
"Error 1 [0] occurred at disk power-on lifetime: 6777 hours (282 days + 9
hours)",
" When the command that caused the error occurred, the device was active
or idle.",
"",
" After command completion occurred, registers were:",
" ER -- ST COUNT LBA_48 LH LM LL DV DC",
" -- -- -- == -- == == == -- -- -- -- --",
" 40 -- 51 00 01 00 00 00 00 00 00 40 00 Error: UNC 1 sectors at LBA =
0x00000000 = 0",
"",
" Commands leading to the command that caused the error were:",
" CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time
Command/Feature_Name",
" -- == -- == -- == == == -- -- -- -- -- ---------------
--------------------",
" 25 00 00 00 01 00 00 00 00 00 00 40 00 00:08:36.585 READ DMA EXT",
" ec 00 00 00 00 00 00 00 00 00 00 00 00 00:08:31.545 IDENTIFY
DEVICE",
" b0 00 da 00 00 00 00 00 c2 4f 00 00 00 00:08:31.542 SMART RETURN
STATUS",
" b0 00 d2 00 f1 00 00 00 c2 4f 00 00 00 00:08:31.541 SMART
ENABLE/DISABLE ATTRIBUTE AUTOSAVE",
" ec 00 00 00 00 00 00 00 00 00 00 00 00 00:08:31.541 IDENTIFY
DEVICE",
"",
"SMART Extended Self-test Log Version: 1 (1 sectors)",
"Num Test_Description Status Remaining
LifeTime(hours) LBA_of_first_error",
"# 1 Extended offline Completed without error 00% 6756
-",
"# 2 Extended offline Completed without error 00% 6573
-",
"# 3 Extended offline Completed without error 00% 102
-",
"# 4 Short offline Completed without error 00% 96
-",
"",
So I did the obvious: I failed and remove the drive from the array. The
problem still showed up, but not as many fails in the same data set.
I have since added the drive back to the array, and am testing the
array now.
mdadm --monitor --test --oneshot /dev/md0
I begin to wonder if I have a bad motherboard.
--
Does anybody read signatures any more?
https://charlescurley.com
https://charlescurley.com/blog/