Hi LVM Team! A very good day to you all.
[ I hope this email is the right one now]

I recently experienced an outage where thin pool activation failed,
details are as follows.
Good news is, I was able to recover the pool through thin_repair.
Thank goodness!

There was no infra induced failure i.e. no network, disk, usage over
limit, memory or compute being faulty orover used in any way.
Node was running healthy for 13 days and suddenly hit this issue.
Pool would handle I/O load (including discards), new volume
creation/deletion, and other regular activities.

I tried to identify if there is a direct known issue, but I was unable to.
This generally seems to be some known issue, but I am unable to find a
direct link with the same signature.

a) how to induce thin pool failures at will, so thin pool does not
activate, but repair succeeds, so  I can test this recovery in some
controlled form.
b) To your best knowledge this seems a known issue and fixed in a later release?
I did my search at both kernel bugzilla and RHEL - and I am hoping you
can help me find it. Internet searches point to errata pages, but I am
unable to find the
exact ticket, commit that address this. The OCP platform was running a
recent release from RHEL.
Linux kernel: 5.14.0-427.109.1.el9_4 RHEL 9.4 This is likely 2 years old though.

c) After spending some time reviewing thin code and the commits since
the mentioned
kernel from kernel.org linux.. I suspect it could be a race with
discard and either IO or device creation/deletion on the same pool
could cause this?
Could the authors here, please confirm my code reading below.
```
*** phase 1 - userspace issues blkdiscard on thin volumes ***
  dm-thin.c : thin_bio_map()
    → detects REQ_OP_DISCARD
    → thin_defer_bio_with_throttle(tc, bio)
      → adds bio to tc->deferred_bio_list        // QUEUED, not processed
      → wakes pool worker thread

  dm-thin.c : do_worker()                         // runs ASYNCHRONOUSLY
    → process_deferred_bios()
      → process_thin_deferred_bios()
        → process_discard_bio()
          → creates mapping, adds to pool->prepared_discards
    → process_prepared(pool->prepared_discards)
      → process_prepared_discard_no_passdown(m):
        → dm_thin_remove_range(tc->td, begin, end)
            [dm-thin-metadata.c]
          → dm_btree_remove_leaves()
              [dm-btree-remove.c]
            → data_block_dec()                    // for each data block
                [dm-thin-metadata.c]
              → dm_sm_dec_blocks()                // DECREMENTS refcount
                  [dm-space-map-common.c]

***  phase 2: these steps still be IN PROGRESS or QUEUED when
userspace deletes the thin volume ***

  dm-thin.c : thin_dtr()                          // dmsetup remove
    → list_del_rcu(&tc->list)                     // removes from
                                                  //   pool->active_thins
    → synchronize_rcu()
    → dm_pool_close_thin_device(tc->td)           // open_count--
    → kfree(tc)                                   // tc FREED

    *** does NOT flush pool workqueue ***          ← GAP 1
    *** does NOT drain prepared_discards ***       ← GAP 2

  dm-thin.c : process_delete_mesg()               // dmsetup message
    → dm_pool_delete_thin_device(pool->pmd, dev_id)
        [dm-thin-metadata.c : __delete_device()]
      → dm_btree_remove(&pmd->tl_info, ...)       // remove from top-level
          [dm-btree-remove.c]                      //   btree
        → subtree_dec()                            // cascades into:
            [dm-thin-metadata.c]
          → dm_btree_del()                         // walks ALL leaves
              [dm-btree.c]
            → data_block_dec() for EVERY remaining block
                [dm-thin-metadata.c]
              → dm_sm_dec_blocks()                 // DECREMENTS refcount
                  [dm-space-map-common.c]          //   for ALL blocks

** phase 3: KERNEL (worker thread — still running from Phase 1) ***
  dm-thin.c : do_worker()                         // ASYNC, still running
    → process_prepared(pool->prepared_discards)
      → process_prepared_discard_no_passdown(m):
        → m->tc points to FREED tc                // ← use-after-free risk
        → dm_thin_remove_range(tc->td, begin, end)
            [dm-thin-metadata.c]
          → dm_btree_remove_leaves()
              [dm-btree-remove.c]
            → data_block_dec()                    // SAME blocks already
                [dm-thin-metadata.c]              //   decremented in
              → dm_sm_dec_blocks()                //   Phase 2!
                  [dm-space-map-common.c]

                ┌──────────────────────────────────────────────────┐
                  sm_ll_dec_bitmap():
                    old = sm_lookup_bitmap(ic->bitmap, bit);
                    switch (old) {
                    case 0:  // ← refcount ALREADY 0
                      DMERR("unable to decrement block");
                      return -EINVAL;  // -22
                    }
                                 [dm-space-map-common.c]
                └──────────────────────────────────────────────────┘

                          ▼
                dm_tm_shadow_block() fails (corrupted space map)
                    [dm-transaction-manager.c]

                          ▼
                dm_pool_inc_data_range() fails with -EINVAL (-22)
                    [dm-thin-metadata.c]

                          ▼
                metadata_operation_failed(pool, "dm_pool_inc_data_range")
                    [dm-thin.c]

                          ▼
                set_pool_mode(pool, PM_READ_ONLY)
                    [dm-thin.c]

                *** POOL IS NOW DEAD ***
```



As always, many thanks for your help.


# issue unable to activate thin pool
```
[Wed Apr  8 17:05:14 2026] device-mapper: space map common: unable to
decrement block
[Wed Apr  8 17:08:11 2026] device-mapper: space map common: unable to
decrement block
[Wed Apr  8 17:08:11 2026] device-mapper: space map common:
dm_tm_shadow_block() failed
[Wed Apr  8 17:08:11 2026] device-mapper: space map common: unable to
decrement block
[Wed Apr  8 17:08:11 2026] device-mapper: space map common:
dm_tm_shadow_block() failed
[Wed Apr  8 17:08:11 2026] device-mapper: space map common: unable to
decrement block
[Wed Apr  8 17:08:11 2026] device-mapper: space map common:
dm_tm_shadow_block() failed
[Wed Apr  8 17:08:31 2026] device-mapper: space map common: unable to
decrement block
[Wed Apr  8 17:08:31 2026] device-mapper: space map common:
dm_tm_shadow_block() failed
[Wed Apr  8 17:08:31 2026] device-mapper: space map common: unable to
decrement block
[Wed Apr  8 17:08:31 2026] device-mapper: space map common:
dm_tm_shadow_block() failed
```

# host and lvm tools version
```
uname -a
Linux kernel: 5.14.0-427.109.1.el9_4
RHEL 9.4

lvm version
2.03.23(2) (2023-11-21)
library: 1.02.197 (2023-11-21)
driver: 4.48.1
```

Below are references to the node block layer.
There was IO, thin volume creations and deletions, IO includes discards too.
```
[root@root core]# lvs -a pwx1
  Please remove the lvm.conf global_filter, it is ignored with the devices file.
  LV                  VG   Attr       LSize   Pool   Origin
  Data%  Meta%  Move Log Cpy%Sync Convert
  1004123733318649769 pwx1 Vwi-a-t---  50.00g pxpool 660563940592999863  0.25
  103699400925372609  pwx1 Vwi-a-t--- 750.00g pxpool 1115712468847455249 59.75
  1072608604746349133 pwx1 Vwi-a-t---  50.00g pxpool 941788757364603035  0.25
  1115712468847455249 pwx1 Vwi-aot--- 750.00g pxpool                     59.75
  1138695541641144166 pwx1 Vwi-a-t---  50.00g pxpool 941788757364603035  0.25
  136169780918964477  pwx1 Vwi-aot---  30.00g pxpool                     33.33
  218651423266852202  pwx1 Vwi-aot---   5.00g pxpool                     3.49
  404947242154831849  pwx1 Vwi-aot---   5.00g pxpool                     4.20
  440731835552948333  pwx1 Vwi-aot---  50.00g pxpool                     5.59
  462681831690737818  pwx1 Vwi-a-t---  50.00g pxpool 73089959772282964   0.25
  519898065353250833  pwx1 Vwi-a-t---  50.00g pxpool 660563940592999863  0.25
  527922274169222783  pwx1 Vwi-aot--- 200.00g pxpool                     28.64
  537994915504805835  pwx1 Vwi-aot---  50.00g pxpool                     10.88
  569690966828279529  pwx1 Vwi-a-t--- 750.00g pxpool 1115712468847455249 59.75
  594992999737145586  pwx1 Vwi-aot--- 200.00g pxpool                     28.91
  660563940592999863  pwx1 Vwi-aot---  50.00g pxpool                     0.25
  702358223003836192  pwx1 Vwi-aot--- 200.00g pxpool                     28.64
  73089959772282964   pwx1 Vwi-aot---  50.00g pxpool                     0.25
  793515512579595979  pwx1 Vwi-aot---  30.00g pxpool                     33.33
  79731196567060146   pwx1 Vwi-aot---  50.00g pxpool                     10.90
  865397616123963982  pwx1 Vwi-aot---  50.00g pxpool                     9.39
  866802183893693297  pwx1 Vwi-aot--- 200.00g pxpool                     28.91
  941788757364603035  pwx1 Vwi-aot---  50.00g pxpool                     0.25
  960350716126095496  pwx1 Vwi-a-t---  50.00g pxpool 73089959772282964   0.25
  [lvol0_pmspare]     pwx1 ewi-------   2.00g
  pxMetaFS            pwx1 Vwi-aot---  64.00g pxpool                     0.05
  pxpool              pwx1 twi-aot---   1.54t
  43.59  5.06 <<< very low tmeta util.
  [pxpool_tdata]      pwx1 Twi-ao----   1.54t
  [pxpool_tmeta]      pwx1 ewi-ao----   4.00g
  pxreserve           pwx1 -wi------k  15.00g
[root@root core]#
[root@root core]# vgs pwx1
  Please remove the lvm.conf global_filter, it is ignored with the devices file.
  VG   #PV #LV #SN Attr   VSize VFree
  pwx1   1  27   0 wz--n- 1.56t    0
[root@root core]# lsblk -s /dev/pwx1/1004123733318649769
NAME                                         MAJ:MIN RM  SIZE RO TYPE
MOUNTPOINTS
pwx1-1004123733318649769                     253:107  0   50G  0 lvm
└─pwx1-pxpool-tpool                          253:14   0  1.5T  0 lvm
  ├─pwx1-pxpool_tmeta                        253:12   0    4G  0 lvm
  │ └─md126                                    9:126  0  1.6T  0 raid0
  │   └─eui.00806e28521374ac24a93718000982be 253:10   0  1.6T  0 mpath
  │     ├─nvme4n2                            259:5    0  1.6T  0 disk
  │     ├─nvme5n2                            259:8    0  1.6T  0 disk
  │     └─nvme6n2                            259:11   0  1.6T  0 disk
  └─pwx1-pxpool_tdata                        253:13   0  1.5T  0 lvm
    └─md126                                    9:126  0  1.6T  0 raid0
      └─eui.00806e28521374ac24a93718000982be 253:10   0  1.6T  0 mpath
        ├─nvme4n2                            259:5    0  1.6T  0 disk
        ├─nvme5n2                            259:8    0  1.6T  0 disk
        └─nvme6n2                            259:11   0  1.6T  0 disk
[root@root core]# ls -al /dev/md/pwx1
lrwxrwxrwx. 1 root root 8 Apr 11 11:48 /dev/md/pwx1 -> ../md126
[root@root core]# dmsetup table /dev/mapper/pwx1-pxpool-tpool
0 3311132672 thin-pool 253:12 253:13 128 0 2 skip_block_zeroing
[root@root core]# dmsetup table /dev/mapper/pwx1-pxpool_tdata
0 1008459776 linear 9:126 35653632
1008459776 629145600 linear 9:126 1048307712
1637605376 1673527296 linear 9:126 1681647616
[root@root core]# dmsetup table /dev/mapper/pwx1-pxpool_tmeta
0 4194304 linear 9:126 1044113408
4194304 4194304 linear 9:126 1677453312
[root@root core]#
[root@root core]# dmsetup table --target multipath
3500a07513c1e23c4: 0 3750748848 multipath 0 0 1 1 service-time 0 1 2 8:16 1 1
3500a07513c1e2ade: 0 3750748848 multipath 0 0 1 1 service-time 0 1 2 8:64 1 1
3500a07513c1e2ca8: 0 3750748848 multipath 0 0 1 1 service-time 0 1 2 8:80 1 1
3500a07513c1e2cf3: 0 3750748848 multipath 0 0 1 1 service-time 0 1 2 8:0 1 1
3500a07513c1e3afc: 0 3750748848 multipath 0 0 1 1 service-time 0 1 2 8:48 1 1
eui.000000000000000100a075223f0c4773: 0 3125627568 multipath 0 0 1 1
service-time 0 1 2 259:2 1 1
eui.000000000000000100a075223f0c47a6: 0 3125627568 multipath 0 0 1 1
service-time 0 1 2 259:0 1 1
eui.000000000000000100a075233fc94da6: 0 3125627568 multipath 0 0 1 1
service-time 0 1 2 259:3 1 1
eui.000000000000000100a075233fc94de4: 0 3125627568 multipath 0 0 1 1
service-time 0 1 2 259:1 1 1
eui.00806e28521374ac24a93718000982bd: 0 14680064000 multipath 3
retain_attached_hw_handler queue_mode bio 0 1 1 queue-length 0 3 1
259:4 1 259:7 1 259:10 1
eui.00806e28521374ac24a93718000982be: 0 3355443200 multipath 3
retain_attached_hw_handler queue_mode bio 0 1 1 queue-length 0 3 1
259:5 1 259:8 1 259:11 1
eui.00806e28521374ac24a93718000982bf: 0 134217728 multipath 3
retain_attached_hw_handler queue_mode bio 0 1 1 queue-length 0 3 1
259:6 1 259:9 1 259:12 1
[root@root core]# dmsetup status --target multipath
3500a07513c1e23c4: 0 3750748848 multipath 2 0 0 0 1 1 A 0 1 2 8:16 A 0 0 1
3500a07513c1e2ade: 0 3750748848 multipath 2 0 0 0 1 1 A 0 1 2 8:64 A 0 0 1
3500a07513c1e2ca8: 0 3750748848 multipath 2 0 0 0 1 1 A 0 1 2 8:80 A 0 0 1
3500a07513c1e2cf3: 0 3750748848 multipath 2 0 0 0 1 1 A 0 1 2 8:0 A 0 0 1
3500a07513c1e3afc: 0 3750748848 multipath 2 0 0 0 1 1 A 0 1 2 8:48 A 0 0 1
eui.000000000000000100a075223f0c4773: 0 3125627568 multipath 2 0 0 0 1
1 A 0 1 2 259:2 A 0 0 1
eui.000000000000000100a075223f0c47a6: 0 3125627568 multipath 2 0 0 0 1
1 A 0 1 2 259:0 A 0 0 1
eui.000000000000000100a075233fc94da6: 0 3125627568 multipath 2 0 0 0 1
1 A 0 1 2 259:3 A 0 0 1
eui.000000000000000100a075233fc94de4: 0 3125627568 multipath 2 0 0 0 1
1 A 0 1 2 259:1 A 0 0 1
eui.00806e28521374ac24a93718000982bd: 0 14680064000 multipath 2 0 0 0
1 1 A 0 3 1 259:4 A 1 22 259:7 A 1 21 259:10 A 1 22
eui.00806e28521374ac24a93718000982be: 0 3355443200 multipath 2 0 0 0 1
1 A 0 3 1 259:5 A 1 4 259:8 A 1 6 259:11 A 1 11
eui.00806e28521374ac24a93718000982bf: 0 134217728 multipath 2 0 0 0 1
1 A 0 3 1 259:6 A 1 0 259:9 A 1 0 259:12 A 1 0
[root@root core]#
[root@root core]# mdadm -D /dev/md/pwx1
/dev/md/pwx1:
           Version : 1.2
     Creation Time : Tue Mar 17 15:29:31 2026
        Raid Level : raid0
        Array Size : 1677589504 (1599.87 GiB 1717.85 GB)
      Raid Devices : 1
     Total Devices : 1
       Persistence : Superblock is persistent

       Update Time : Mon Mar 23 20:52:51 2026
             State : clean
    Active Devices : 1
   Working Devices : 1
    Failed Devices : 0
     Spare Devices : 0

        Chunk Size : 1024K

Consistency Policy : none

              Name : any:pwx1
              UUID : 1716a351:ed3e53e7:0ce83ccd:8d3a3021
            Events : 16

    Number   Major   Minor   RaidDevice State
       0     253       10        0      active sync   /dev/dm-10
[root@root core]#
```

Reply via email to