Hi Chris,

See the output below. Any suggestions based on it?
Thanks!

-- 
Groet / Cheers,
Patrick Dijkgraaf



On Mon, 2018-12-03 at 20:16 -0700, Chris Murphy wrote:
> Also useful information for autopsy, perhaps not for fixing, is to
> know whether the SCT ERC value for every drive is less than the
> kernel's SCSI driver block device command timeout value. It's super
> important that the drive reports an explicit read failure before the
> read command is considered failed by the kernel. If the drive is
> still
> trying to do a read, and the kernel command timer times out, it'll
> just do a reset of the whole link and we lose the outcome for the
> hanging command. Upon explicit read error only, can Btrfs, or md
> RAID,
> know what device and physical sector has a problem, and therefore how
> to reconstruct the block, and fix the bad sector with a write of
> known
> good data.
> 
> smartctl -l scterc /device/

Seems to not work:

[root@cornelis ~]# for disk in /dev/sd{e..x}; do echo ${disk}; smartctl
-l scterc ${disk}; done
/dev/sde
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SMART WRITE LOG does not return COUNT and LBA_LOW register
SCT (Get) Error Recovery Control command failed

/dev/sdf
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SMART WRITE LOG does not return COUNT and LBA_LOW register
SCT (Get) Error Recovery Control command failed

/dev/sdg
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SMART WRITE LOG does not return COUNT and LBA_LOW register
SCT (Get) Error Recovery Control command failed

/dev/sdh
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SMART WRITE LOG does not return COUNT and LBA_LOW register
SCT (Get) Error Recovery Control command failed

/dev/sdi
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SMART WRITE LOG does not return COUNT and LBA_LOW register
SCT (Get) Error Recovery Control command failed

/dev/sdj
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SMART WRITE LOG does not return COUNT and LBA_LOW register
SCT (Get) Error Recovery Control command failed

/dev/sdk
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

Smartctl open device: /dev/sdk failed: No such device
/dev/sdl
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SMART WRITE LOG does not return COUNT and LBA_LOW register
SCT (Get) Error Recovery Control command failed

/dev/sdm
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SMART WRITE LOG does not return COUNT and LBA_LOW register
SCT (Get) Error Recovery Control command failed

/dev/sdn
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SCT Error Recovery Control command not supported

/dev/sdo
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SMART WRITE LOG does not return COUNT and LBA_LOW register
SCT (Get) Error Recovery Control command failed

/dev/sdp
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SMART WRITE LOG does not return COUNT and LBA_LOW register
SCT (Get) Error Recovery Control command failed

/dev/sdq
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SMART WRITE LOG does not return COUNT and LBA_LOW register
SCT (Get) Error Recovery Control command failed

/dev/sdr
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SMART WRITE LOG does not return COUNT and LBA_LOW register
SCT (Get) Error Recovery Control command failed

/dev/sds
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SMART WRITE LOG does not return COUNT and LBA_LOW register
SCT (Get) Error Recovery Control command failed

/dev/sdt
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SCT Error Recovery Control command not supported

/dev/sdu
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SMART WRITE LOG does not return COUNT and LBA_LOW register
SCT (Get) Error Recovery Control command failed

/dev/sdv
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SMART WRITE LOG does not return COUNT and LBA_LOW register
SCT (Get) Error Recovery Control command failed

/dev/sdw
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SMART WRITE LOG does not return COUNT and LBA_LOW register
SCT (Get) Error Recovery Control command failed

/dev/sdx
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.18.16-arch1-1-ARCH]
(local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, 
www.smartmontools.org

SCT Error Recovery Control command not supported

> and
> cat /sys/block/sda/device/timeout

[root@cornelis ~]# cat /sys/block/sd{e..x}/device/timeout
30
30
30
30
30
30
cat: /sys/block/sdk/device/timeout: No such file or directory
30
30
30
30
30
30
30
30
30
30
30
30
30

> Only if SCT ERC is enabled with a value below 30, or if the kernel
> command timer is change to be well above 30 (like 180, which is
> absolutely crazy but a separate conversation) can we be sure that
> there haven't just been resets going on for a while, preventing bad
> sectors from being fixed up all along, and can contribute to the
> problem. This comes up on the linux-raid (mainly md driver) list all
> the time, and it contributes to lost RAID all the time. And arguably
> it leads to unnecessary data loss in even the single device
> desktop/laptop use case as well.
> 
> 
> Chris Murphy

Reply via email to