Package: smartmontools
Version: 6.5+svn4324-1
Severity: wishlist
Tags: upstream
Dear Maintainer,
Using Areca official proprietary `cli64` [0] utility to check
raidset/volume status (used by Nagios/Icinga monitoring tools) produces
"conflict" with `smartctl`. Munin and smartd monitoring fails at rare,
but still annoyingly enough, cases, when `cli64` utility has opened `/dev/sg2`
device with exclusive lock at the same time.
This is example email sent by `smartd`:
```
This message was generated by the smartd daemon running on:
host name: hostname
DNS domain: my.tld
The following warning/error was logged by the smartd daemon:
Device: /dev/sg2 [areca_disk#01_enc#01], unable to open device
Device info:
WDC WD2005FBYZ-01YCBB2, S/N:WD-WMC...,
WWN:5-0014ee-059..., FW:RR07, 2.00 TB
For details see host's SYSLOG.
You can also use the smartctl utility for further investigation.
Another message will be sent in 24 hours if the problem persists.
```
This is example of running `cli64` and `smartctl` at the same time:
```
# cli64 rsf info &
[1] 3934
# smartctl -d areca,3 /dev/sg2 -x
smartctl 6.6 2016-05-31 r4324 [x86_64-linux-4.18.0-0.bpo.1-amd64] (local
build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke,
www.smartmontools.org
Unable to open /proc/scsi/sg/device_hdr for reading
do_scsi_cmnd_io with write buffer failed code = ffffffff
Unable to open /proc/scsi/sg/device_hdr for reading
do_scsi_cmnd_io with write buffer failed code = ffffffff
Unable to open /proc/scsi/sg/device_hdr for reading
do_scsi_cmnd_io with write buffer failed code = ffffffff
Smartctl open device: /dev/sg2 [areca_disk#03_enc#01] failed:
Input/output error
```
`strace` shows that `cli64` uses `O_EXCL` mode:
```
# strace -t -f -e open cli64 rsf info
...
11:07:40
open("/sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0/host0/target0:0:16/0:0:16:0/scsi_generic/sg2/dev",
O_RDONLY) = 3
11:07:40 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3
11:07:40 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3
11:07:40 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3
11:07:40 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3
11:07:41 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3
11:07:41 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3
11:07:41 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3
11:07:41 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3
11:07:41 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3
11:07:41 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3
11:07:41 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3
11:07:41 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3
11:07:41 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3
11:07:41 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3
11:07:42 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3
11:07:42 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3
# Name Disks TotalCap FreeCap DiskChannels State
===============================================================================
11:07:42 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3
11:07:42 open("/dev/sg2", O_RDWR|O_EXCL|O_NONBLOCK) = 3
1 Gold 4 8000.0GB 0.0GB 139B Normal
===============================================================================
GuiErrMsg<0x00>: Success.
```
So in the end, this produces syslog, email spam and some lost columns in
Munin graphs. It also produces issues when you use configuration
management system that tries to detect available drives, "hidden" under
hw raid adapter, via `smartctl` calls, for rendering Munin/smartd configuration
- results
are flapping due to random lock conflicts.
It would be "fixable" by retrying at least 2-3 times (probably with a
delay of a second or so) in case of device inspected by smartctl is
locked at the moment. Or maybe waiting on device, or other similar
implementation.
Although it is possible to workaround this issue in your own Salt
modules by retrying `smartctl` call multiple times, but `smartd` would still
have this issue.
[0] http://www.areca.us/support/s_linux/driver/cli/linuxcli_V1.15.8_180529.zip
-- System Information:
Debian Release: 9.6
APT prefers stable-updates
APT policy: (500, 'stable-updates'), (500, 'stable')
Architecture: amd64 (x86_64)
Kernel: Linux 4.18.0-0.bpo.1-amd64 (SMP w/6 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) (ignored: LC_ALL
set to en_US.UTF-8), LANGUAGE=en_US.UTF-8 (charmap=UTF-8) (ignored: LC_ALL set
to en_US.UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
Versions of packages smartmontools depends on:
ii debianutils 4.8.1.1
ii init-system-helpers 1.48
ii libc6 2.24-11+deb9u3
ii libcap-ng0 0.7.7-3+b1
ii libgcc1 1:6.3.0-18+deb9u1
ii libselinux1 2.6-3+b3
ii libstdc++6 6.3.0-18+deb9u1
ii lsb-base 9.20161125
Versions of packages smartmontools recommends:
ii bsd-mailx [mailx] 8.1.2-0.20160123cvs-4
Versions of packages smartmontools suggests:
pn gsmartcontrol <none>
pn smart-notifier <none>
-- Configuration Files:
/etc/smartd.conf changed:
/dev/sg2 -d areca,1 -H -l selftest -l error -f -m root -M exec
/usr/share/smartmontools/smartd-runner
/dev/sg2 -d areca,3 -H -l selftest -l error -f -m root -M exec
/usr/share/smartmontools/smartd-runner
/dev/sg2 -d areca,9 -H -l selftest -l error -f -m root -M exec
/usr/share/smartmontools/smartd-runner
/dev/sg2 -d areca,11 -H -l selftest -l error -f -m root -M exec
/usr/share/smartmontools/smartd-runner
DEVICESCAN -d removable -n standby -m root -M exec
/usr/share/smartmontools/smartd-runner
-- no debconf information