Dne 12. 01. 24 v 19:19 lists.linux....@frank.fyi napsal(a):
Hi,
at first, a happy new year to everyone.
I'm currently considering to use dm-cache with a ramdisk/volatile PV for a
small project and noticed some usability issues that make using it less
appealing.
Currently this means:
1. Adding a cache to a VG will cause the entire VG to depend on the cache. If
one of the cache drives fails or is missing it cannot be accessed and even
worse if this was the VG containing the root filesystem it also causes the
entire system to fail to boot. Even though we may already know that we don't
have any dataloss but just degraded access times.
2. Requires manual scripting to activate the VG and handle potentially
missing/failing cache PVs
3. LVM doesn't have a way to clearly indicate that the physical volume is volatile and
that dataloss on it is expected. Maybe even including the PV header itself. Or
alternatively a way to indicate "if something is wrong with the cache, just forget
about it (if possible)".
4. Just recreating the 'pvcreate --zero --pvmetadatacopies 0 --norestorefile
--uuid' appears to be enough to get a write-through cache and thereby also the
associated volume working again. Therefore it doesn't look like LVM cares about
the cache data being lost, but only about the PV itself. Therefore failing to
activate the VG appears to be a bit too convservative and probably the error
handling here could be improved (see above).
6. Also as there is currently no place within the LVM metadata to label a PV/VG/LV as
"volatile" it is also not clear both to LVM as well as admins looking at output
of tools like lvdisplay that a specific LV is volatile. Therefore there will also be no
safeguards and warnings against actions that would cause dataloss (like adding a ramdisk
to a raid0, or even just adding a write-back instead of a write-through cache).
Therefore I'd like to ask if it would be possible to make two small
improvements:
1. Add a "volatile" flag to PVs, LVs, and VGs to allow to clearly indicate that
they are non-persistent and that dataloss is expected.
2. And one of:
a. Change error handling and automatic recovery from missing PVs if the LV or
VG has the volatile flag. Like e.g. automatically `--uncache`-ing the volume
and mount it without the cache that is missing its PV. This is even more
important for boot volumes, where such a configuration would prevent the system
from booting at all.
b. Alternatively, add native support for ramdisks. This mainly would require
extending the VG metadata with an 'is-RAMdisk' flag that causes the lookup for
the PV to be skipped and instead a new ramdisk being allocated while the VG is
being activated (we know its size from the VG metadata, as we know how much we
allocate/use). This could also help with unit tests and CI/CD usages (where
currently the PV is manually created with brd before activating/creating the
VG). Including our own test/lib/aux.sh, test/shell/devicesfile-misc.sh,
test/shell/devicesfile-refresh.sh, test/shell/devicesfile-serial.sh.
c. Same as 2a, but instead of automatically uncaching the volume, add a flag
to the VG metadata that allows LVM to use the hints file to find the PV and
automatically re-initialize it regardless of its header. Maybe together with an
additional configuration option to demand the block device being zeroed (I.E.
to avoid reading the entire block device, the first 4 sectors) to safeguard
against accidental data-loss that we normally get by looking for the correct PV
header.
d. Same as 2b, but limited to caches only. Considering how caching is currently
implemented adding ramdisks with an limitation to caches may cause unecessary additional
work and be less useful compared to adding them as a new additional kind of PV. Also it
wouldn't help the additional usecase with unit tests and CI/CD pipelines. Additionally it
would also simplify "playing with" and learning about LVM.
e. Add an option to have lvconvert enable caching but WITHOUT saving it
within the VGs metadata. Causing LVM to forget about the case. I.E. next time
the system boots LVM would mount the VG normally without the cache. For
write-through caches this should always be safe and for write-back it only
causes dataloss when the system crashes without flushing it.
My personal favourite is 2b, followed by 2e.
2b basically realizes my entire usecase within LVM natively and 2e at least
avoids the need to automating the LVM recovery just to be able to reboot the
system and allow me to write a systemd service to add the cache at runtime.
Hi
We do have several such things in our TODO plans - but it's actually way more
complicated then you might think. It's also not completely true that even
'writethrough' cache cannot have dirty-blocks (aka - only present in cache and
origin had failed writes).
Another important note here is - the dm-cache target is not intended to be a
'bigger page-cache' - it has different purpose and different usage.
So using 'ramdisk' for dm-cache is kind of pointless when the same RAM can be
likely more effectively used by system's page cache logic.
To extend the dirty cached pages - there is 'dm-writecache' target to be used
that in some way extents amount page cache with the size of your fast NVMe/SSD
device - but it's not accelerating 'reads' from hotspots.
Lvm should cope (eventually with --force option) with removal of missing
devices holding cached blocks - however there can be still some dead spots.
But ATM we are not seeing it as some major trouble. Hotspot cache is simply
not supposed to be randomly removed from your systems - as it it's not easy to
rebuild.
But it might be possible to more easily automate bootup process in case a PV
with cache is missing (something like for 'raidLVs' with missing legs).
Regards
Zdenek