Hi,
  I'm unsure of the details of a warning in the lvmraid(7) 'repair' scrub
and the behaviour I'm seeing before I run it.

I've got my / on a RAID1 LVM on top of a pair of nvme devices,
nvme0n1p3 and mvme1n1p1 are the PVs;
I noticed an NVMe read error in my logs:
Apr 21 14:58:52 dalek kernel: nvme1n1: I/O Cmd(0x2) @ LBA 416591488, 1024 
blocks, I/O Error (sct 0x2 / sc 0x81) MORE 
Apr 21 14:58:52 dalek kernel: critical medium error, dev nvme1n1, sector 
416591488 op 0x0:(READ) flags 0x80700 phys_seg 71 prio class 0

but there have been no complaints from LVM;
root@dalek:/home/dg# lvs -a
  LV                    VG         Attr       LSize   Pool Origin Data%  Meta%  
Move Log Cpy%Sync Convert
...
  root                  nvmeroot21 rwi-aor--- 150.00g                           
         100.00          
  [root_rimage_0]       nvmeroot21 iwi-aor--- 150.00g                           
                         
  [root_rimage_1]       nvmeroot21 iwi-aor--- 150.00g                           
                         
  [root_rmeta_0]        nvmeroot21 ewi-aor---   4.00m                           
                         
  [root_rmeta_1]        nvmeroot21 ewi-aor---   4.00m                           
                         

 (1) If the NVMe is giving an I/O error - why am I not seeing something angry 
from LVM?

Having seen this I tried a
  lvchange --synaction check
on the LV.  That completed but showed a whole bunch of mismatches
on the LV (although the Cpy%Sync was still 100.00%)
and showed a bunch more read errors in the logs - mostly from
the nvme1 but a couple from the nvme0 - ohh.

  Given that these are all read errors and the smart
data says it's got plenty of available spare I'm wondering
if the read errors are just blocks that haven't been read
for a few years.

  I want to try a 'repair' but there's a warning in the man page;
'When two different blocks of data must be made consistent, it chooses the  
block
       from the device that would be used during RAID initialization.'

  (2) How do I know which is the default device?
  (3) Given that the log shows I/O errors on the NVMe, if the error propagates
up to LVM during the repair, will the repair read the block from the other 
device?

(The host is Fedora 42 with 6.14.2-300.fc42.x86_64 kernel)

Thanks in advance,

Dave
(See lsblk below)

nvme0n1                      259:0    0 465.8G  0 disk 
├─nvme0n1p1                  259:2    0   600M  0 part /boot/efi
├─nvme0n1p2                  259:3    0     1G  0 part /boot
└─nvme0n1p3                  259:4    0 464.2G  0 part 
  ├─nvmeroot21-root_rmeta_1  252:2    0     4M  0 lvm  
  │ └─nvmeroot21-root        252:4    0   150G  0 lvm  /
  ├─nvmeroot21-root_rimage_1 252:3    0   150G  0 lvm  
  │ └─nvmeroot21-root        252:4    0   150G  0 lvm  /
  ├─nvmeroot21-swap_rmeta_1  252:7    0     4M  0 lvm  
  │ └─nvmeroot21-swap        252:9    0    32G  0 lvm  [SWAP]
  ├─nvmeroot21-swap_rimage_1 252:8    0    32G  0 lvm  
  │ └─nvmeroot21-swap        252:9    0    32G  0 lvm  [SWAP]
  └─nvmeroot21-fast          252:13   0   250G  0 lvm  /discs/fast
nvme1n1                      259:1    0 465.8G  0 disk 
└─nvme1n1p1                  259:5    0 465.8G  0 part 
  ├─nvmeroot21-root_rmeta_0  252:0    0     4M  0 lvm  
  │ └─nvmeroot21-root        252:4    0   150G  0 lvm  /
  ├─nvmeroot21-root_rimage_0 252:1    0   150G  0 lvm  
  │ └─nvmeroot21-root        252:4    0   150G  0 lvm  /
  ├─nvmeroot21-swap_rmeta_0  252:5    0     4M  0 lvm  
  │ └─nvmeroot21-swap        252:9    0    32G  0 lvm  [SWAP]
  ├─nvmeroot21-swap_rimage_0 252:6    0    32G  0 lvm  
  │ └─nvmeroot21-swap        252:9    0    32G  0 lvm  [SWAP]
  └─nvmeroot21-fast          252:13   0   250G  0 lvm  /discs/fast

-- 
 -----Open up your eyes, open up your mind, open up your code -------   
/ Dr. David Alan Gilbert    |       Running GNU/Linux       | Happy  \ 
\        dave @ treblig.org |                               | In Hex /
 \ _________________________|_____ http://www.treblig.org   |_______/

Reply via email to