small update: i wrote a patch (attached) to ignore the apparent unsync'd condition, and now dmeventd will trigger lvconvert --repair.
it would be great to fix this properly though. On Tue, Apr 30, 2019 at 10:51 PM Nick Owens <[email protected]> wrote: > > Package: dmeventd > Version: 2:1.02.155-2 > Severity: critical > Tags: upstream > Justification: causes serious data loss > > Dear Maintainer, > > i have set up a raid1 in lvm. it's a simple configuration to verify the > long term stability and use of lvm as my storage system. > > in /etc/lvm/lvm.conf, i've set activation.raid_fault_policy="allocate" > which FWICT is intented to do something akin to hot-spares - to rebuild > the raid automatically on failure. > > here's my lvm configuration: > > root@abacus:~# pvs /dev/sd[abc]; vgs raid; lvs raid/raid > PV VG Fmt Attr PSize PFree > /dev/sda raid lvm2 a-- <3.64t <3.64t > /dev/sdb raid lvm2 a-- <3.64t 3.63t > /dev/sdc raid lvm2 a-- <3.64t 3.63t > VG #PV #LV #SN Attr VSize VFree > raid 3 1 0 wz--n- <10.92t <10.91t > LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync > Convert > raid raid rwi-aor-r- 5.00g 100.00 > > current devices in the raid: > > root@abacus:~# lvs -a -o > name,raid_sync_action,sync_percent,copy_percent,devices raid > LV SyncAction Cpy%Sync Cpy%Sync Devices > raid idle 100.00 100.00 > raid_rimage_0(0),raid_rimage_1(0) > [raid_rimage_0] /dev/sdc(1) > [raid_rimage_1] /dev/sdb(1) > [raid_rmeta_0] /dev/sdc(0) > [raid_rmeta_1] /dev/sdb(0) > > this raid was made with `lvcreate --type raid1`. > > to simulate a raid failure, i run something like: > > echo offline > /sys/class/block/sdc/device/state > > eventually the raid will fail, but dmeventd will print this message in syslog: > > WARNING: Device #1 of raid1 array, raid-raid, has failed. > WARNING: waiting for resynchronization to finish before initiating repair on > RAID device raid-raid. > > having looked at the code, daemons/dmeventd/plugins/raid/dmeventd_raid.c, this > piece of code is SUPPOSED to trigger `lvconvert --repair`, but it does not > because it thinks that the raid is out of sync or something. > > if i run `lvconvert --repair` by hand, the raid will be repaired using > the remaining PV. > > if i look at the output of `dmsetup status` i see this for the raid: > > root@abacus:~# dmsetup status raid-raid > 0 10485760 raid raid1 2 AD 10485760/10485760 idle 0 0 - > > i have patched my local dmeventd to print more information when it > parses the device mapper event. here's a log of it during simulated raid > failure: > > Apr 30 22:39:18 abacus lvm[22973]: dm_get_status_raid: params=raid1 2 AD > 0/10485760 recover 0 0 - > Apr 30 22:39:18 abacus lvm[22973]: dm_get_status_raid: num_fields=8 > Apr 30 22:39:18 abacus lvm[22973]: dm_get_status_raid: insync_regions=0 > total_regions=10485760 > Apr 30 22:39:18 abacus lvm[22973]: WARNING: Device #1 of raid1 array, > raid-raid, has failed. > Apr 30 22:39:18 abacus lvm[22973]: WARNING: waiting for resynchronization to > finish before initiating repair on RAID device raid-raid. > Apr 30 22:39:18 abacus lvm[22973]: WARNING: insync_regions=0 > total_regions=10485760 > > here we see that the event that devicemapper actually parses in > dm_get_status_raid() function does report 0 for the insync_regions > variable. > > i don't have time to investigate futher currently, but the outstanding > question is simply: given these circumstances, why does dmeventd not > rebuild the raid automatically? > > i've marked this bug as critical because this issue represents data loss for > anyone whose raid was not automatically repaired if they expected it to be. > > -- System Information: > Debian Release: buster/sid > APT prefers testing > APT policy: (500, 'testing') > Architecture: amd64 (x86_64) > > Kernel: Linux 4.19.0-4-amd64 (SMP w/16 CPU cores) > Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), > LANGUAGE=en_US.UTF-8 (charmap=UTF-8) > Shell: /bin/sh linked to /bin/dash > Init: systemd (via /run/systemd/system) > LSM: AppArmor: enabled > > Versions of packages dmeventd depends on: > ii libblkid1 2.33.1-0.1 > ii libc6 2.28-8 > ii libdevmapper-event1.02.1 2:1.02.155-2 > ii libdevmapper1.02.1 2:1.02.155-2 > ii liblvm2cmd2.03 2.03.02-2 > ii libselinux1 2.8-1+b1 > ii libudev1 241-3 > > dmeventd recommends no packages. > > dmeventd suggests no packages. > > -- no debconf information
Index: lvm2-2.03.02/daemons/dmeventd/plugins/raid/dmeventd_raid.c =================================================================== --- lvm2-2.03.02.orig/daemons/dmeventd/plugins/raid/dmeventd_raid.c +++ lvm2-2.03.02/daemons/dmeventd/plugins/raid/dmeventd_raid.c @@ -83,7 +83,7 @@ static int _process_raid_event(struct ds "before initiating repair on RAID device %s.", device); } - goto out; /* Not yet done syncing with accessible devices */ + //goto out; /* Not yet done syncing with accessible devices */ } if (state->failed)

