Re: add volatile flag to PV/LVs (for cache) to avoid degraded state on reboot

Zdenek Kabelac Wed, 17 Jan 2024 03:08:34 -0800

Dne 12. 01. 24 v 19:19 lists.linux....@frank.fyi napsal(a):

Hi,


at first, a happy new year to everyone.

I'm currently considering to use dm-cache with a ramdisk/volatile PV for a 
small project and noticed some usability issues that make using it less 
appealing.


Currently this means:
1. Adding a cache to a VG will cause the entire VG to depend on the cache. If 
one of the cache drives fails or is missing it cannot be accessed and even 
worse if this was the VG containing the root filesystem it also causes the 
entire system to fail to boot. Even though we may already know that we don't 
have any dataloss but just degraded access times.
2. Requires manual scripting to activate the VG and handle potentially 
missing/failing cache PVs
3. LVM doesn't have a way to clearly indicate that the physical volume is volatile and 
that dataloss on it is expected. Maybe even including the PV header itself. Or 
alternatively a way to indicate "if something is wrong with the cache, just forget 
about it (if possible)".
4. Just recreating the 'pvcreate --zero --pvmetadatacopies 0 --norestorefile 
--uuid' appears to be enough to get a write-through cache and thereby also the 
associated volume working again. Therefore it doesn't look like LVM cares about 
the cache data being lost, but only about the PV itself. Therefore failing to 
activate the VG appears to be a bit too convservative and probably the error 
handling here could be improved (see above).
6. Also as there is currently no place within the LVM metadata to label a PV/VG/LV as 
"volatile" it is also not clear both to LVM as well as admins looking at output 
of tools like lvdisplay that a specific LV is volatile. Therefore there will also be no 
safeguards and warnings against actions that would cause dataloss (like adding a ramdisk 
to a raid0, or even just adding a write-back instead of a write-through cache).


Therefore I'd like to ask if it would be possible to make two small 
improvements:
1. Add a "volatile" flag to PVs, LVs, and VGs to allow to clearly indicate that 
they are non-persistent and that dataloss is expected.
2. And one of:
  a. Change error handling and automatic recovery from missing PVs if the LV or 
VG has the volatile flag. Like e.g. automatically `--uncache`-ing the volume 
and mount it without the cache that is missing its PV. This is even more 
important for boot volumes, where such a configuration would prevent the system 
from booting at all.
  b. Alternatively, add native support for ramdisks. This mainly would require 
extending the VG metadata with an 'is-RAMdisk' flag that causes the lookup for 
the PV to be skipped and instead a new ramdisk being allocated while the VG is 
being activated (we know its size from the VG metadata, as we know how much we 
allocate/use). This could also help with unit tests and CI/CD usages (where 
currently the PV is manually created with brd before activating/creating the 
VG). Including our own test/lib/aux.sh, test/shell/devicesfile-misc.sh, 
test/shell/devicesfile-refresh.sh, test/shell/devicesfile-serial.sh.
  c. Same as 2a, but instead of automatically uncaching the volume, add a flag 
to the VG metadata that allows LVM to use the hints file to find the PV and 
automatically re-initialize it regardless of its header. Maybe together with an 
additional configuration option to demand the block device being zeroed (I.E. 
to avoid reading the entire block device, the first 4 sectors) to safeguard 
against accidental data-loss that we normally get by looking for the correct PV 
header.
  d. Same as 2b, but limited to caches only. Considering how caching is currently 
implemented adding ramdisks with an limitation to caches may cause unecessary additional 
work and be less useful compared to adding them as a new additional kind of PV. Also it 
wouldn't help the additional usecase with unit tests and CI/CD pipelines. Additionally it 
would also simplify "playing with" and learning about LVM.
  e. Add an option to have lvconvert enable caching but WITHOUT saving it 
within the VGs metadata. Causing LVM to forget about the case. I.E. next time 
the system boots LVM would mount the VG normally without the cache. For 
write-through caches this should always be safe and for write-back it only 
causes dataloss when the system crashes without flushing it.

My personal favourite is 2b, followed by 2e.
2b basically realizes my entire usecase within LVM natively and 2e at least 
avoids the need to automating the LVM recovery just to be able to reboot the 
system and allow me to write a systemd service to add the cache at runtime.

Hi

We do have several such things in our TODO plans - but it's actually way morecomplicated then you might think. It's also not completely true that even'writethrough' cache cannot have dirty-blocks (aka - only present in cache andorigin had failed writes).

Another important note here is - the dm-cache target is not intended to be a'bigger page-cache' - it has different purpose and different usage.

So using 'ramdisk' for dm-cache is kind of pointless when the same RAM can belikely more effectively used by system's page cache logic.

To extend the dirty cached pages - there is 'dm-writecache' target to be usedthat in some way extents amount page cache with the size of your fast NVMe/SSDdevice - but it's not accelerating 'reads' from hotspots.

Lvm should cope (eventually with --force option) with removal of missingdevices holding cached blocks - however there can be still some dead spots.But ATM we are not seeing it as some major trouble. Hotspot cache is simplynot supposed to be randomly removed from your systems - as it it's not easy torebuild.

But it might be possible to more easily automate bootup process in case a PVwith cache is missing (something like for 'raidLVs' with missing legs).


Regards

Zdenek

Re: add volatile flag to PV/LVs (for cache) to avoid degraded state on reboot

Reply via email to