Your message dated Wed, 28 Apr 2021 18:08:39 +0200
with message-id <[email protected]>
and subject line Closing this bug
has caused the Debian Bug report #774583,
regarding linux-image-3.16.0-0.bpo.4-amd64: Sporadic RAID1 degradation during
/usr/share/mdadm/checkarray cron job
to be marked as done.
This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.
(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact [email protected]
immediately.)
--
774583: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=774583
Debian Bug Tracking System
Contact [email protected] with problems
--- Begin Message ---
Package: linux-image-3.16.0-0.bpo.4-amd64
Version: 3.16.7-ckt2-1~bpo70+1
Severity: important
Dear Maintainer,
* What led up to the situation?
One of my RAID1 arrays sporadically degrades during the checkarray cron job:
Jan 4 00:57:01 nihlus /USR/SBIN/CRON[4367]: (root) CMD (if [ -x
/usr/share/mdadm/checkarray ] && [ $(date +%d) -le 7 ]; then
/usr/share/mdadm/checkarray --cron --all --idle --quiet; fi)
Jan 4 00:57:01 nihlus kernel: [ 3932.435274] md: data-check of RAID array
md0
Jan 4 00:57:01 nihlus kernel: [ 3932.455356] md: minimum _guaranteed_
speed: 1000 KB/sec/disk.
Jan 4 00:57:01 nihlus kernel: [ 3932.469160] md: delaying data-check of md2
until md0 has finished (they share one or more physical units)
Jan 4 00:57:01 nihlus kernel: [ 3932.524839] md: using maximum available
idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
Jan 4 00:57:01 nihlus kernel: [ 3932.569568] md: using 128k window, over a
total of 262132k.
Jan 4 00:57:03 nihlus kernel: [ 3934.473794] md: md0: data-check done.
Jan 4 00:57:03 nihlus kernel: [ 3934.491622] md: data-check of RAID array
md2
Jan 4 00:57:03 nihlus kernel: [ 3934.510850] md: minimum _guaranteed_
speed: 1000 KB/sec/disk.
Jan 4 00:57:03 nihlus mdadm[2289]: RebuildFinished event detected on md
device /dev/md/0
Jan 4 00:57:03 nihlus kernel: [ 3934.541334] md: using maximum available
idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
Jan 4 00:57:03 nihlus kernel: [ 3934.587243] md: using 128k window, over a
total of 1952201680k.
[...]
Jan 4 03:35:35 nihlus kernel: [13446.203438] sd 1:0:0:0: [sdb] Unhandled
error code
Jan 4 03:35:35 nihlus kernel: [13446.225179] sd 1:0:0:0: [sdb]
Jan 4 03:35:35 nihlus kernel: [13446.239316] Result: hostbyte=DID_OK
driverbyte=DRIVER_TIMEOUT
Jan 4 03:35:35 nihlus kernel: [13446.265222] sd 1:0:0:0: [sdb] CDB:
Jan 4 03:35:36 nihlus kernel: [13446.280916] Write(10): 2a 00 00 d8 67 08
00 00 20 00
Jan 4 03:35:36 nihlus kernel: [13446.303438] end_request: I/O error, dev
sdb, sector 14182152
Jan 4 03:35:36 nihlus kernel: [13446.330133] md/raid1:md2: Disk failure on
sdb3, disabling device.
Jan 4 03:35:36 nihlus kernel: [13446.330133] md/raid1:md2: Operation
continuing on 1 devices.
Jan 4 03:35:36 nihlus kernel: [13446.401456] md: md2: data-check
interrupted.
Jan 4 03:35:36 nihlus kernel: [13446.467913] RAID1 conf printout:
Jan 4 03:35:36 nihlus kernel: [13446.467920] --- wd:1 rd:2
Jan 4 03:35:36 nihlus kernel: [13446.467925] disk 0, wo:0, o:1, dev:sda3
Jan 4 03:35:36 nihlus kernel: [13446.467929] disk 1, wo:1, o:0, dev:sdb3
Jan 4 03:35:36 nihlus kernel: [13446.492871] RAID1 conf printout:
Jan 4 03:35:36 nihlus kernel: [13446.492878] --- wd:1 rd:2
Jan 4 03:35:36 nihlus kernel: [13446.492883] disk 0, wo:0, o:1, dev:sda3
Jan 4 03:35:36 nihlus mdadm[2289]: Fail event detected on md device
/dev/md/2
Jan 4 03:35:36 nihlus postfix/pickup[4968]: 3kFPGJ1mCzz1n: uid=0 from=<root>
Jan 4 03:35:36 nihlus postfix/cleanup[5060]: 3kFPGJ1mCzz1n:
message-id=<[email protected]>
Jan 4 03:35:36 nihlus mdadm[2289]: FailSpare event detected on md device
/dev/md/2, component device /dev/sdb3
Jan 4 03:35:36 nihlus mdadm[2289]: RebuildFinished event detected on md
device /dev/md/2
# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sda3[0] sdb3[2](F)
1952201680 blocks super 1.2 [2/1] [U_]
md1 : active (auto-read-only) raid1 sda2[0] sdb2[1]
1048564 blocks super 1.2 [2/2] [UU]
md0 : active raid1 sda1[0] sdb1[1]
262132 blocks super 1.2 [2/2] [UU]
# smartctl -i /dev/sdb
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.16.0-0.bpo.4-amd64] (local
build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Device Model: TOSHIBA DT01ACA200
Serial Number: [redacted]
LU WWN Device Id: 5 000039 ff3e05ac0
Firmware Version: MX4OABB0
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Sun Jan 4 18:50:18 2015 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
# smartctl -l selftest /dev/sdb
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.16.0-0.bpo.4-amd64] (local
build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours)
LBA_of_first_error
# 1 Extended offline Completed without error 00% 1490
-
# 2 Extended offline Completed without error 00% 822
-
# 3 Short offline Completed without error 00% 812
-
* What exactly did you do (or not do) that was effective (or
ineffective)?
# mdadm --manage /dev/md2 --remove /dev/sdb3
# mdadm --manage /dev/md2 --add /dev/sdb3
* What was the outcome of this action?
The array rebuilt without _any_ errors. The drive never went offline during
normal operation and also
shows no errors when conducting self-tests. It only sporadically gets
removed from the array during
the checkarray job - when a driver timeout occurs.
* What outcome did you expect instead?
No drive degradation during cron job.
-- System Information:
Debian Release: 7.7
APT prefers stable
APT policy: (1001, 'stable'), (500, 'unstable'), (500, 'testing')
Architecture: amd64 (x86_64)
Kernel: Linux 3.16.0-0.bpo.4-amd64
Locale: LANG=en_US.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
--- End Message ---
--- Begin Message ---
This bug was filed for a very old kernel. If you can reproduce it with
- the current version in unstable/testing
- the latest kernel from buster.backports
please reopen the bug, see https://www.debian.org/Bugs/server-control
--- End Message ---