Hello,
I’m looking for advice on an LVM-thin metadata corruption case on a
Proxmox host. I have stopped further repair attempts and preserved the
current state for analysis.
Environment
* Proxmox VE host
* Thin pool: `VMDATA0/VMDATA0`
* Backing storage had RAID-5 issues involving two disks
* After reseating the disks, the RAID came back online, but the thin
pool would no longer activate
Original Proxmox/LVM error
`activating LV 'VMDATA0/VMDATA0' failed: Check of pool VMDATA0/VMDATA0
failed (status:1). Manual repair required!`
What I tried
1. `vgcfgbackup VMDATA0`
2. `lvconvert --repair VMDATA0/VMDATA0`
This failed with:
`value size mismatch: expected 8, but got 24 (block 13182)`
At that point I added temporary VG space on a USB disk so I could try
manual metadata recovery.
Manual recovery steps attempted
* Created temporary metadata LVs
* Used `lvconvert --swapmetadata` to extract the pool metadata
* Activated the extracted metadata LV
* Preserved a raw image of the extracted metadata
* Ran `thin_check`
* Ran `thin_repair`
* Ran `thin_dump —repair`
Current results
`thin_check /dev/VMDATA0/meta_extract` reports:
* `missing devices: [0, -]`
* `bad checksum in btree node (block 79511)`
* `missing all mappings for devices: [0, -]`
* `bad checksum in btree node (block 79506)`
`thin_repair -i /dev/VMDATA0/meta_extract -o /dev/VMDATA0/repair_meta`
fails with:
`value size mismatch: expected 8, but got 24 (block 13182)`
`thin_dump --repair -o /root/VMDATA0_repaired.xml
/dev/VMDATA0/meta_extract` fails with the same:
`value size mismatch: expected 8, but got 24 (block 13182)`
Current LV state at the time of extraction looked like this:
* thin pool `VMDATA0`
* extracted metadata LV `meta_extract`
* fresh target metadata LV `repair_meta`
What I have preserved
* Raw metadata image from the extracted metadata LV:
`VMDATA0_meta_extract.raw`
* A second raw copy from the extracted LV device
* `vgcfgbackup` output and LVM archive files
* Diagnostics bundle with:
* `pvs`, `vgs`, `lvs`
* `thin_check` output
* `dmesg`
* kernel journal
* `lvm version`
* checksums
I can provide access to the full case directory or a tarball over HTTP
if someone is willing to look at it. I would prefer to share the link
privately with anyone interested.
My main questions are:
1. Is there any remaining offline recovery path worth trying with the
standard dm-thin/LVM tools?
2. Does this failure pattern usually indicate that the mapping metadata
is beyond recovery by `thin_repair`/`thin_dump`?
3. Would it be useful to inspect the raw metadata image further, and if
so, what specific tools or commands would you recommend next?
I can provide any command output that would be useful.
Thanks very much for any guidance,
Ray