add volatile flag to PV/LVs (for cache) to avoid degraded state on reboot

lists . linux . dev Fri, 12 Jan 2024 10:19:26 -0800

Hi,

at first, a happy new year to everyone.


I'm currently considering to use dm-cache with a ramdisk/volatile PV for a 
small project and noticed some usability issues that make using it less 
appealing.


Currently this means:
1. Adding a cache to a VG will cause the entire VG to depend on the cache. If 
one of the cache drives fails or is missing it cannot be accessed and even 
worse if this was the VG containing the root filesystem it also causes the 
entire system to fail to boot. Even though we may already know that we don't 
have any dataloss but just degraded access times.
2. Requires manual scripting to activate the VG and handle potentially 
missing/failing cache PVs
3. LVM doesn't have a way to clearly indicate that the physical volume is 
volatile and that dataloss on it is expected. Maybe even including the PV 
header itself. Or alternatively a way to indicate "if something is wrong with 
the cache, just forget about it (if possible)".
4. Just recreating the 'pvcreate --zero --pvmetadatacopies 0 --norestorefile 
--uuid' appears to be enough to get a write-through cache and thereby also the 
associated volume working again. Therefore it doesn't look like LVM cares about 
the cache data being lost, but only about the PV itself. Therefore failing to 
activate the VG appears to be a bit too convservative and probably the error 
handling here could be improved (see above).
6. Also as there is currently no place within the LVM metadata to label a 
PV/VG/LV as "volatile" it is also not clear both to LVM as well as admins 
looking at output of tools like lvdisplay that a specific LV is volatile. 
Therefore there will also be no safeguards and warnings against actions that 
would cause dataloss (like adding a ramdisk to a raid0, or even just adding a 
write-back instead of a write-through cache).


Therefore I'd like to ask if it would be possible to make two small 
improvements:
1. Add a "volatile" flag to PVs, LVs, and VGs to allow to clearly indicate that 
they are non-persistent and that dataloss is expected.
2. And one of:
 a. Change error handling and automatic recovery from missing PVs if the LV or 
VG has the volatile flag. Like e.g. automatically `--uncache`-ing the volume 
and mount it without the cache that is missing its PV. This is even more 
important for boot volumes, where such a configuration would prevent the system 
from booting at all.
 b. Alternatively, add native support for ramdisks. This mainly would require 
extending the VG metadata with an 'is-RAMdisk' flag that causes the lookup for 
the PV to be skipped and instead a new ramdisk being allocated while the VG is 
being activated (we know its size from the VG metadata, as we know how much we 
allocate/use). This could also help with unit tests and CI/CD usages (where 
currently the PV is manually created with brd before activating/creating the 
VG). Including our own test/lib/aux.sh, test/shell/devicesfile-misc.sh, 
test/shell/devicesfile-refresh.sh, test/shell/devicesfile-serial.sh.
 c. Same as 2a, but instead of automatically uncaching the volume, add a flag 
to the VG metadata that allows LVM to use the hints file to find the PV and 
automatically re-initialize it regardless of its header. Maybe together with an 
additional configuration option to demand the block device being zeroed (I.E. 
to avoid reading the entire block device, the first 4 sectors) to safeguard 
against accidental data-loss that we normally get by looking for the correct PV 
header.
 d. Same as 2b, but limited to caches only. Considering how caching is 
currently implemented adding ramdisks with an limitation to caches may cause 
unecessary additional work and be less useful compared to adding them as a new 
additional kind of PV. Also it wouldn't help the additional usecase with unit 
tests and CI/CD pipelines. Additionally it would also simplify "playing with" 
and learning about LVM.
 e. Add an option to have lvconvert enable caching but WITHOUT saving it within 
the VGs metadata. Causing LVM to forget about the case. I.E. next time the 
system boots LVM would mount the VG normally without the cache. For 
write-through caches this should always be safe and for write-back it only 
causes dataloss when the system crashes without flushing it.

My personal favourite is 2b, followed by 2e.
2b basically realizes my entire usecase within LVM natively and 2e at least 
avoids the need to automating the LVM recovery just to be able to reboot the 
system and allow me to write a systemd service to add the cache at runtime.

Best regards

add volatile flag to PV/LVs (for cache) to avoid degraded state on reboot

Reply via email to