On Fri, 22 May 2026 09:53:17 -0600
Charles Curley <[email protected]> wrote:

> To be thorough, I have run extended SMART tests on the hard drives,
> kicked mdadm into testing the RAID array, and fscked the LVM
> partitions on the RAID array. Only fsck turned up issues, and that
> has not stopped.

Some additional testing.

Suspecting a bad hard drive, I ran more extended tests on all four
members of the RAID array. One showed problems:

      "Error 1 [0] occurred at disk power-on lifetime: 6777 hours (282 days + 9 
hours)",
      "  When the command that caused the error occurred, the device was active 
or idle.",
      "",
      "  After command completion occurred, registers were:",
      "  ER -- ST COUNT  LBA_48  LH LM LL DV DC",
      "  -- -- -- == -- == == == -- -- -- -- --",
      "  40 -- 51 00 01 00 00 00 00 00 00 40 00  Error: UNC 1 sectors at LBA = 
0x00000000 = 0",
      "",
      "  Commands leading to the command that caused the error were:",
      "  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  
Command/Feature_Name",
      "  -- == -- == -- == == == -- -- -- -- --  ---------------  
--------------------",
      "  25 00 00 00 01 00 00 00 00 00 00 40 00     00:08:36.585  READ DMA EXT",
      "  ec 00 00 00 00 00 00 00 00 00 00 00 00     00:08:31.545  IDENTIFY 
DEVICE",
      "  b0 00 da 00 00 00 00 00 c2 4f 00 00 00     00:08:31.542  SMART RETURN 
STATUS",
      "  b0 00 d2 00 f1 00 00 00 c2 4f 00 00 00     00:08:31.541  SMART 
ENABLE/DISABLE ATTRIBUTE AUTOSAVE",
      "  ec 00 00 00 00 00 00 00 00 00 00 00 00     00:08:31.541  IDENTIFY 
DEVICE",
      "",
      "SMART Extended Self-test Log Version: 1 (1 sectors)",
      "Num  Test_Description    Status                  Remaining  
LifeTime(hours)  LBA_of_first_error",
      "# 1  Extended offline    Completed without error       00%      6756     
    -",
      "# 2  Extended offline    Completed without error       00%      6573     
    -",
      "# 3  Extended offline    Completed without error       00%       102     
    -",
      "# 4  Short offline       Completed without error       00%        96     
    -",
      "",


So I did the obvious: I failed and remove the drive from the array. The
problem still showed up, but not as many fails in the same data set.

I have since added the drive back to the array, and am testing the
array now.

mdadm --monitor --test --oneshot /dev/md0

I begin to wonder if I have a bad motherboard.

-- 
Does anybody read signatures any more?

https://charlescurley.com
https://charlescurley.com/blog/

Reply via email to