checkarray cron job)

Debian Bug Tracking System Wed, 28 Apr 2021 09:10:40 -0700

Your message dated Wed, 28 Apr 2021 18:08:39 +0200
with message-id <[email protected]>
and subject line Closing this bug
has caused the Debian Bug report #774583,
regarding linux-image-3.16.0-0.bpo.4-amd64: Sporadic RAID1 degradation during 
/usr/share/mdadm/checkarray cron job
to be marked as done.


This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact [email protected]
immediately.)


-- 
774583: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=774583
Debian Bug Tracking System
Contact [email protected] with problems

--- Begin Message ---

Package: linux-image-3.16.0-0.bpo.4-amd64
Version: 3.16.7-ckt2-1~bpo70+1
Severity: important

Dear Maintainer,

   * What led up to the situation?

   One of my RAID1 arrays sporadically degrades during the checkarray cron job:

   Jan  4 00:57:01 nihlus /USR/SBIN/CRON[4367]: (root) CMD (if [ -x 
/usr/share/mdadm/checkarray ] && [ $(date +%d) -le 7 ]; then 
/usr/share/mdadm/checkarray --cron --all --idle --quiet; fi)
   Jan  4 00:57:01 nihlus kernel: [ 3932.435274] md: data-check of RAID array 
md0
   Jan  4 00:57:01 nihlus kernel: [ 3932.455356] md: minimum _guaranteed_  
speed: 1000 KB/sec/disk.
   Jan  4 00:57:01 nihlus kernel: [ 3932.469160] md: delaying data-check of md2 
until md0 has finished (they share one or more physical units)
   Jan  4 00:57:01 nihlus kernel: [ 3932.524839] md: using maximum available 
idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
   Jan  4 00:57:01 nihlus kernel: [ 3932.569568] md: using 128k window, over a 
total of 262132k.
   Jan  4 00:57:03 nihlus kernel: [ 3934.473794] md: md0: data-check done.
   Jan  4 00:57:03 nihlus kernel: [ 3934.491622] md: data-check of RAID array 
md2
   Jan  4 00:57:03 nihlus kernel: [ 3934.510850] md: minimum _guaranteed_  
speed: 1000 KB/sec/disk.
   Jan  4 00:57:03 nihlus mdadm[2289]: RebuildFinished event detected on md 
device /dev/md/0
   Jan  4 00:57:03 nihlus kernel: [ 3934.541334] md: using maximum available 
idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
   Jan  4 00:57:03 nihlus kernel: [ 3934.587243] md: using 128k window, over a 
total of 1952201680k.
   [...]
   Jan  4 03:35:35 nihlus kernel: [13446.203438] sd 1:0:0:0: [sdb] Unhandled 
error code
   Jan  4 03:35:35 nihlus kernel: [13446.225179] sd 1:0:0:0: [sdb]
   Jan  4 03:35:35 nihlus kernel: [13446.239316] Result: hostbyte=DID_OK 
driverbyte=DRIVER_TIMEOUT
   Jan  4 03:35:35 nihlus kernel: [13446.265222] sd 1:0:0:0: [sdb] CDB:
   Jan  4 03:35:36 nihlus kernel: [13446.280916] Write(10): 2a 00 00 d8 67 08 
00 00 20 00
   Jan  4 03:35:36 nihlus kernel: [13446.303438] end_request: I/O error, dev 
sdb, sector 14182152
   Jan  4 03:35:36 nihlus kernel: [13446.330133] md/raid1:md2: Disk failure on 
sdb3, disabling device.
   Jan  4 03:35:36 nihlus kernel: [13446.330133] md/raid1:md2: Operation 
continuing on 1 devices.
   Jan  4 03:35:36 nihlus kernel: [13446.401456] md: md2: data-check 
interrupted.
   Jan  4 03:35:36 nihlus kernel: [13446.467913] RAID1 conf printout:
   Jan  4 03:35:36 nihlus kernel: [13446.467920]  --- wd:1 rd:2
   Jan  4 03:35:36 nihlus kernel: [13446.467925]  disk 0, wo:0, o:1, dev:sda3
   Jan  4 03:35:36 nihlus kernel: [13446.467929]  disk 1, wo:1, o:0, dev:sdb3
   Jan  4 03:35:36 nihlus kernel: [13446.492871] RAID1 conf printout:
   Jan  4 03:35:36 nihlus kernel: [13446.492878]  --- wd:1 rd:2
   Jan  4 03:35:36 nihlus kernel: [13446.492883]  disk 0, wo:0, o:1, dev:sda3
   Jan  4 03:35:36 nihlus mdadm[2289]: Fail event detected on md device 
/dev/md/2
   Jan  4 03:35:36 nihlus postfix/pickup[4968]: 3kFPGJ1mCzz1n: uid=0 from=<root>
   Jan  4 03:35:36 nihlus postfix/cleanup[5060]: 3kFPGJ1mCzz1n: 
message-id=<[email protected]>
   Jan  4 03:35:36 nihlus mdadm[2289]: FailSpare event detected on md device 
/dev/md/2, component device /dev/sdb3
   Jan  4 03:35:36 nihlus mdadm[2289]: RebuildFinished event detected on md 
device /dev/md/2

   # cat /proc/mdstat
   Personalities : [raid1]
   md2 : active raid1 sda3[0] sdb3[2](F)
        1952201680 blocks super 1.2 [2/1] [U_]

   md1 : active (auto-read-only) raid1 sda2[0] sdb2[1]
        1048564 blocks super 1.2 [2/2] [UU]

   md0 : active raid1 sda1[0] sdb1[1]
        262132 blocks super 1.2 [2/2] [UU]

   # smartctl -i /dev/sdb
   smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.16.0-0.bpo.4-amd64] (local 
build)
   Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

   === START OF INFORMATION SECTION ===
   Device Model:     TOSHIBA DT01ACA200
   Serial Number:    [redacted]
   LU WWN Device Id: 5 000039 ff3e05ac0
   Firmware Version: MX4OABB0
   User Capacity:    2,000,398,934,016 bytes [2.00 TB]
   Sector Sizes:     512 bytes logical, 4096 bytes physical
   Device is:        Not in smartctl database [for details use: -P showall]
   ATA Version is:   8
   ATA Standard is:  ATA-8-ACS revision 4
   Local Time is:    Sun Jan  4 18:50:18 2015 CET
   SMART support is: Available - device has SMART capability.
   SMART support is: Enabled

   # smartctl -l selftest /dev/sdb
   smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.16.0-0.bpo.4-amd64] (local 
build)
   Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

   === START OF READ SMART DATA SECTION ===
   SMART Self-test log structure revision number 1
   Num  Test_Description    Status                  Remaining  LifeTime(hours)  
LBA_of_first_error
   # 1  Extended offline    Completed without error       00%      1490         
-
   # 2  Extended offline    Completed without error       00%       822         
-
   # 3  Short offline       Completed without error       00%       812         
-

   * What exactly did you do (or not do) that was effective (or
     ineffective)?

   # mdadm --manage /dev/md2 --remove /dev/sdb3
   # mdadm --manage /dev/md2 --add /dev/sdb3

   * What was the outcome of this action?
   
   The array rebuilt without _any_ errors. The drive never went offline during 
normal operation and also
   shows no errors when conducting self-tests. It only sporadically gets 
removed from the array during
   the checkarray job - when a driver timeout occurs.
  
   * What outcome did you expect instead?
  
   No drive degradation during cron job.
  
-- System Information:
Debian Release: 7.7
  APT prefers stable
  APT policy: (1001, 'stable'), (500, 'unstable'), (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 3.16.0-0.bpo.4-amd64
Locale: LANG=en_US.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

--- End Message ---

--- Begin Message ---

This bug was filed for a very old kernel. If you can reproduce it with
- the current version in unstable/testing
- the latest kernel from buster.backports
please reopen the bug, see https://www.debian.org/Bugs/server-control

--- End Message ---

Bug#774583: marked as done (linux-image-3.16.0-0.bpo.4-amd64: Sporadic RAID1 degradation during /usr/share/mdadm/checkarray cron job)

Reply via email to