small update: i wrote a patch (attached) to ignore the apparent
unsync'd condition, and now dmeventd will trigger lvconvert --repair.

it would be great to fix this properly though.

On Tue, Apr 30, 2019 at 10:51 PM Nick Owens <[email protected]> wrote:
>
> Package: dmeventd
> Version: 2:1.02.155-2
> Severity: critical
> Tags: upstream
> Justification: causes serious data loss
>
> Dear Maintainer,
>
> i have set up a raid1 in lvm. it's a simple configuration to verify the
> long term stability and use of lvm as my storage system.
>
> in /etc/lvm/lvm.conf, i've set activation.raid_fault_policy="allocate"
> which FWICT is intented to do something akin to hot-spares - to rebuild
> the raid automatically on failure.
>
> here's my lvm configuration:
>
> root@abacus:~# pvs /dev/sd[abc]; vgs raid; lvs raid/raid
>   PV         VG   Fmt  Attr PSize  PFree
>   /dev/sda   raid lvm2 a--  <3.64t <3.64t
>   /dev/sdb   raid lvm2 a--  <3.64t  3.63t
>   /dev/sdc   raid lvm2 a--  <3.64t  3.63t
>   VG   #PV #LV #SN Attr   VSize   VFree
>   raid   3   1   0 wz--n- <10.92t <10.91t
>   LV   VG   Attr       LSize Pool Origin Data%  Meta%  Move Log Cpy%Sync 
> Convert
>   raid raid rwi-aor-r- 5.00g                                    100.00
>
> current devices in the raid:
>
> root@abacus:~# lvs -a -o 
> name,raid_sync_action,sync_percent,copy_percent,devices raid
>   LV              SyncAction Cpy%Sync Cpy%Sync Devices
>   raid            idle       100.00   100.00   
> raid_rimage_0(0),raid_rimage_1(0)
>   [raid_rimage_0]                              /dev/sdc(1)
>   [raid_rimage_1]                              /dev/sdb(1)
>   [raid_rmeta_0]                               /dev/sdc(0)
>   [raid_rmeta_1]                               /dev/sdb(0)
>
> this raid was made with `lvcreate --type raid1`.
>
> to simulate a raid failure, i run something like:
>
> echo offline > /sys/class/block/sdc/device/state
>
> eventually the raid will fail, but dmeventd will print this message in syslog:
>
> WARNING: Device #1 of raid1 array, raid-raid, has failed.
> WARNING: waiting for resynchronization to finish before initiating repair on 
> RAID device raid-raid.
>
> having looked at the code, daemons/dmeventd/plugins/raid/dmeventd_raid.c, this
> piece of code is SUPPOSED to trigger `lvconvert --repair`, but it does not
> because it thinks that the raid is out of sync or something.
>
> if i run `lvconvert --repair` by hand, the raid will be repaired using
> the remaining PV.
>
> if i look at the output of `dmsetup status` i see this for the raid:
>
> root@abacus:~# dmsetup status raid-raid
> 0 10485760 raid raid1 2 AD 10485760/10485760 idle 0 0 -
>
> i have patched my local dmeventd to print more information when it
> parses the device mapper event. here's a log of it during simulated raid
> failure:
>
> Apr 30 22:39:18 abacus lvm[22973]: dm_get_status_raid: params=raid1 2 AD 
> 0/10485760 recover 0 0 -
> Apr 30 22:39:18 abacus lvm[22973]: dm_get_status_raid: num_fields=8
> Apr 30 22:39:18 abacus lvm[22973]: dm_get_status_raid: insync_regions=0 
> total_regions=10485760
> Apr 30 22:39:18 abacus lvm[22973]: WARNING: Device #1 of raid1 array, 
> raid-raid, has failed.
> Apr 30 22:39:18 abacus lvm[22973]: WARNING: waiting for resynchronization to 
> finish before initiating repair on RAID device raid-raid.
> Apr 30 22:39:18 abacus lvm[22973]: WARNING: insync_regions=0 
> total_regions=10485760
>
> here we see that the event that devicemapper actually parses in
> dm_get_status_raid() function does report 0 for the insync_regions
> variable.
>
> i don't have time to investigate futher currently, but the outstanding
> question is simply: given these circumstances, why does dmeventd not
> rebuild the raid automatically?
>
> i've marked this bug as critical because this issue represents data loss for
> anyone whose raid was not automatically repaired if they expected it to be.
>
> -- System Information:
> Debian Release: buster/sid
>   APT prefers testing
>   APT policy: (500, 'testing')
> Architecture: amd64 (x86_64)
>
> Kernel: Linux 4.19.0-4-amd64 (SMP w/16 CPU cores)
> Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), 
> LANGUAGE=en_US.UTF-8 (charmap=UTF-8)
> Shell: /bin/sh linked to /bin/dash
> Init: systemd (via /run/systemd/system)
> LSM: AppArmor: enabled
>
> Versions of packages dmeventd depends on:
> ii  libblkid1                 2.33.1-0.1
> ii  libc6                     2.28-8
> ii  libdevmapper-event1.02.1  2:1.02.155-2
> ii  libdevmapper1.02.1        2:1.02.155-2
> ii  liblvm2cmd2.03            2.03.02-2
> ii  libselinux1               2.8-1+b1
> ii  libudev1                  241-3
>
> dmeventd recommends no packages.
>
> dmeventd suggests no packages.
>
> -- no debconf information
Index: lvm2-2.03.02/daemons/dmeventd/plugins/raid/dmeventd_raid.c
===================================================================
--- lvm2-2.03.02.orig/daemons/dmeventd/plugins/raid/dmeventd_raid.c
+++ lvm2-2.03.02/daemons/dmeventd/plugins/raid/dmeventd_raid.c
@@ -83,7 +83,7 @@ static int _process_raid_event(struct ds
 					 "before initiating repair on RAID device %s.", device);
 			}
 
-			goto out; /* Not yet done syncing with accessible devices */
+			//goto out; /* Not yet done syncing with accessible devices */
 		}
 
 		if (state->failed)

Reply via email to