Hi Chris, See the output below. Any suggestions based on it? Thanks!
-- Groet / Cheers, Patrick Dijkgraaf On Mon, 2018-12-03 at 20:16 -0700, Chris Murphy wrote: > Also useful information for autopsy, perhaps not for fixing, is to > know whether the SCT ERC value for every drive is less than the > kernel's SCSI driver block device command timeout value. It's super > important that the drive reports an explicit read failure before the > read command is considered failed by the kernel. If the drive is > still > trying to do a read, and the kernel command timer times out, it'll > just do a reset of the whole link and we lose the outcome for the > hanging command. Upon explicit read error only, can Btrfs, or md > RAID, > know what device and physical sector has a problem, and therefore how > to reconstruct the block, and fix the bad sector with a write of > known > good data. > > smartctl -l scterc /device/ Seems to not work: [root@cornelis ~]# for disk in /dev/sd{e..x}; do echo ${disk}; smartctl -l scterc ${disk}; done /dev/sde smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org SMART WRITE LOG does not return COUNT and LBA_LOW register SCT (Get) Error Recovery Control command failed /dev/sdf smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org SMART WRITE LOG does not return COUNT and LBA_LOW register SCT (Get) Error Recovery Control command failed /dev/sdg smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org SMART WRITE LOG does not return COUNT and LBA_LOW register SCT (Get) Error Recovery Control command failed /dev/sdh smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org SMART WRITE LOG does not return COUNT and LBA_LOW register SCT (Get) Error Recovery Control command failed /dev/sdi smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org SMART WRITE LOG does not return COUNT and LBA_LOW register SCT (Get) Error Recovery Control command failed /dev/sdj smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org SMART WRITE LOG does not return COUNT and LBA_LOW register SCT (Get) Error Recovery Control command failed /dev/sdk smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org Smartctl open device: /dev/sdk failed: No such device /dev/sdl smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org SMART WRITE LOG does not return COUNT and LBA_LOW register SCT (Get) Error Recovery Control command failed /dev/sdm smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org SMART WRITE LOG does not return COUNT and LBA_LOW register SCT (Get) Error Recovery Control command failed /dev/sdn smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org SCT Error Recovery Control command not supported /dev/sdo smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org SMART WRITE LOG does not return COUNT and LBA_LOW register SCT (Get) Error Recovery Control command failed /dev/sdp smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org SMART WRITE LOG does not return COUNT and LBA_LOW register SCT (Get) Error Recovery Control command failed /dev/sdq smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org SMART WRITE LOG does not return COUNT and LBA_LOW register SCT (Get) Error Recovery Control command failed /dev/sdr smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org SMART WRITE LOG does not return COUNT and LBA_LOW register SCT (Get) Error Recovery Control command failed /dev/sds smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org SMART WRITE LOG does not return COUNT and LBA_LOW register SCT (Get) Error Recovery Control command failed /dev/sdt smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org SCT Error Recovery Control command not supported /dev/sdu smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org SMART WRITE LOG does not return COUNT and LBA_LOW register SCT (Get) Error Recovery Control command failed /dev/sdv smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org SMART WRITE LOG does not return COUNT and LBA_LOW register SCT (Get) Error Recovery Control command failed /dev/sdw smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org SMART WRITE LOG does not return COUNT and LBA_LOW register SCT (Get) Error Recovery Control command failed /dev/sdx smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org SCT Error Recovery Control command not supported > and > cat /sys/block/sda/device/timeout [root@cornelis ~]# cat /sys/block/sd{e..x}/device/timeout 30 30 30 30 30 30 cat: /sys/block/sdk/device/timeout: No such file or directory 30 30 30 30 30 30 30 30 30 30 30 30 30 > Only if SCT ERC is enabled with a value below 30, or if the kernel > command timer is change to be well above 30 (like 180, which is > absolutely crazy but a separate conversation) can we be sure that > there haven't just been resets going on for a while, preventing bad > sectors from being fixed up all along, and can contribute to the > problem. This comes up on the linux-raid (mainly md driver) list all > the time, and it contributes to lost RAID all the time. And arguably > it leads to unnecessary data loss in even the single device > desktop/laptop use case as well. > > > Chris Murphy