Your message dated Mon, 29 Nov 2021 13:13:21 -0800
with message-id
<CAH_zEu5Aq_4+fepPBS8D05qqoNEw=nbqzx0gcudp9sk+aza...@mail.gmail.com>
and subject line fixed since v2_03_06
has caused the Debian Bug report #928278,
regarding dmeventd: raid1 fails to automatically rebuild when
raid_fault_policy="allocate"
to be marked as done.
This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.
(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact ow...@bugs.debian.org
immediately.)
--
928278: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=928278
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems
--- Begin Message ---
Package: dmeventd
Version: 2:1.02.155-2
Severity: critical
Tags: upstream
Justification: causes serious data loss
Dear Maintainer,
i have set up a raid1 in lvm. it's a simple configuration to verify the
long term stability and use of lvm as my storage system.
in /etc/lvm/lvm.conf, i've set activation.raid_fault_policy="allocate"
which FWICT is intented to do something akin to hot-spares - to rebuild
the raid automatically on failure.
here's my lvm configuration:
root@abacus:~# pvs /dev/sd[abc]; vgs raid; lvs raid/raid
PV VG Fmt Attr PSize PFree
/dev/sda raid lvm2 a-- <3.64t <3.64t
/dev/sdb raid lvm2 a-- <3.64t 3.63t
/dev/sdc raid lvm2 a-- <3.64t 3.63t
VG #PV #LV #SN Attr VSize VFree
raid 3 1 0 wz--n- <10.92t <10.91t
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
raid raid rwi-aor-r- 5.00g 100.00
current devices in the raid:
root@abacus:~# lvs -a -o
name,raid_sync_action,sync_percent,copy_percent,devices raid
LV SyncAction Cpy%Sync Cpy%Sync Devices
raid idle 100.00 100.00 raid_rimage_0(0),raid_rimage_1(0)
[raid_rimage_0] /dev/sdc(1)
[raid_rimage_1] /dev/sdb(1)
[raid_rmeta_0] /dev/sdc(0)
[raid_rmeta_1] /dev/sdb(0)
this raid was made with `lvcreate --type raid1`.
to simulate a raid failure, i run something like:
echo offline > /sys/class/block/sdc/device/state
eventually the raid will fail, but dmeventd will print this message in syslog:
WARNING: Device #1 of raid1 array, raid-raid, has failed.
WARNING: waiting for resynchronization to finish before initiating repair on
RAID device raid-raid.
having looked at the code, daemons/dmeventd/plugins/raid/dmeventd_raid.c, this
piece of code is SUPPOSED to trigger `lvconvert --repair`, but it does not
because it thinks that the raid is out of sync or something.
if i run `lvconvert --repair` by hand, the raid will be repaired using
the remaining PV.
if i look at the output of `dmsetup status` i see this for the raid:
root@abacus:~# dmsetup status raid-raid
0 10485760 raid raid1 2 AD 10485760/10485760 idle 0 0 -
i have patched my local dmeventd to print more information when it
parses the device mapper event. here's a log of it during simulated raid
failure:
Apr 30 22:39:18 abacus lvm[22973]: dm_get_status_raid: params=raid1 2 AD
0/10485760 recover 0 0 -
Apr 30 22:39:18 abacus lvm[22973]: dm_get_status_raid: num_fields=8
Apr 30 22:39:18 abacus lvm[22973]: dm_get_status_raid: insync_regions=0
total_regions=10485760
Apr 30 22:39:18 abacus lvm[22973]: WARNING: Device #1 of raid1 array,
raid-raid, has failed.
Apr 30 22:39:18 abacus lvm[22973]: WARNING: waiting for resynchronization to
finish before initiating repair on RAID device raid-raid.
Apr 30 22:39:18 abacus lvm[22973]: WARNING: insync_regions=0
total_regions=10485760
here we see that the event that devicemapper actually parses in
dm_get_status_raid() function does report 0 for the insync_regions
variable.
i don't have time to investigate futher currently, but the outstanding
question is simply: given these circumstances, why does dmeventd not
rebuild the raid automatically?
i've marked this bug as critical because this issue represents data loss for
anyone whose raid was not automatically repaired if they expected it to be.
-- System Information:
Debian Release: buster/sid
APT prefers testing
APT policy: (500, 'testing')
Architecture: amd64 (x86_64)
Kernel: Linux 4.19.0-4-amd64 (SMP w/16 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8),
LANGUAGE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled
Versions of packages dmeventd depends on:
ii libblkid1 2.33.1-0.1
ii libc6 2.28-8
ii libdevmapper-event1.02.1 2:1.02.155-2
ii libdevmapper1.02.1 2:1.02.155-2
ii liblvm2cmd2.03 2.03.02-2
ii libselinux1 2.8-1+b1
ii libudev1 241-3
dmeventd recommends no packages.
dmeventd suggests no packages.
-- no debconf information
--- End Message ---
--- Begin Message ---
upstream lvm2 release tag v2_03_06 contains the fix patch from redhat.
--- End Message ---