Re: [gentoo-user] New system, systemd, and dm-integrity

Rich Freeman Fri, 15 May 2020 06:50:27 -0700

On Fri, May 15, 2020 at 9:18 AM antlists <[email protected]> wrote:
>
> On 15/05/2020 12:30, Rich Freeman wrote:
> > The actual problem that this module solves is no-doubt long solved
> > upstream, but here is the blog post on dracut modules (which is fairly
> > well-documented in the official docs as well):
> > https://rich0gentoo.wordpress.com/2012/01/21/a-quick-dracut-module/
>
> I don't think it is ... certainly I'm not aware of anything other than
> LUKS that uses dm-integrity, and LUKS sets it up itself.


I was referring to my specific problem in the blog article with mdadm
not actually detecting drives for whatever reason.

I no longer use md-raid on any of my systems so I can't vouch for
whether it is still an issue, but something like that was probably
fixed somewhere.

> > If your module is reasonably generic you could probably get upstream
> > to merge it as well.
>
> No. Like LUKS, I intend to merge the code into mdadm and let the raid
> side handle it. If mdadm detects a dm-integrity/raid setup, it'll set up
> dm-integrity and then recurse to set up raid.

Seems reasonable enough, though you could probably argue for
separation of concerns to do it in dracut.  In any case, I do suspect
the dracut folks would consider such a use case valid for inclusion in
the default package if you do want to have a module for it.

> openSUSE is my only experience of btrfs. And it hasn't been nice. When
> it goes wrong it's nasty. Plus only raid 1 really works - I've heard
> that 5 and 6 have design flaws which means it will be very hard to get
> them to work properly.

Yeah, I moved away from btrfs as well for the same reasons.  I got
into it years ago thinking that it was still a bit unpolished but
seemed to be rapidly gaining traction.  For whatever reason they never
got regressions under control and I got burned more than once by it.
I did keep backups but restoration is of course painful.

> I've never met zfs.

So, compared to what you're doing I could see the following advantages:

1.  All the filesystem-layer stuff which obviously isn't in-scope for
the lower layers, including snapshots (obviously those can be done
with lvm but it is a bit cleaner at the filesystem level).  I'd argue
that some of this stuff isn't as flexible as with btrfs but it will be
far superior to something like ext4 on top of what you're doing.

2.  No RAID write-hole.  I'd think that your solution with the
integrity layer would detect corruption resulting from the write hole,
but I don't think it could prevent it, since a RAID stripe is still
overwritten in place.  But, I've never had a conversation with an
md-raid developer so perhaps you have a more educated view on the
matter.

3.  COW offers some of the data-integrity benefits of full data
journaling without the performance costs of this.  On the other hand
it probably is not going to perform as well as overwriting in place
without any data journaling.  In theory this is more of a
filesystem-level feature though.

4.  In the future COW with zfs could probably enable better
performance on SSD/SMR with TRIM by structuring writes to consolidate
free blocks into erase zones.  However, as far as I'm aware that is a
theoretical future benefit and not anything available, and I have no
idea if anybody is working on that.  This sort of benefit would
require the vertical integration that zfs uses.

In general zfs is much more stable than btrfs and far less likely to
eat your data.  And FWIW I did once (many years ago) have
ext4+lvm+mdadm eat my data - I think it was due to some kind of lvm
metadata corruption or something like that, because basically an fsck
on one ext4 partition scrambled a different ext4 partition, which
obviously should not be possible if lvm is working right.  I have no
idea what the root cause of that was - could have been bad RAM or
something which of course can mess up anything short of a distributed
filesystem with integrity checking above the host level (which, IMO,
most of the solutions don't do as well as they could).

One big disadvantage with zfs is that it is far less flexible at the
physical layer.  You can add the equivalent of LVM PVs, and you can
expand a PV, but you can't remove a PV in anything but the latest
version of zfs, and I think there are some limitations around how this
works.  You can't reshape the equivalent of an mdadm array, but you
can replace a drive in an array and grow an array if all the
underlying devices have enough space.  You can add/remove mirrors from
the equivalent of a raid1 to freely go between no-redundancy to any
multiplicity you wish.  Striped arrays are basically fixed in layout
once created.

> As the linux raid wiki says (I wrote it :-) do you want the complexity
> of a "do it all" filesystem, or the abstraction of dedicated layers?

Yeah, it is a well-established argument and has some merit.

I'm not sure I'd go this route for my regular hosts since zfs works
reasonably well (though your solution is more flexible than zfs).

However, I might evaluate how dm-integrity plus ext4 (maybe with LVM
in-between) works on my lizardfs chunkservers.  These have redundancy
above the host level, but I do want integrity checking for static data
issues, and I'm not sure that lizardfs provides any guarantees here
(plus having it at the host level would probably perform better
anyway).  If the integrity layer returned an io error lizardfs would
just overwrite the impacted files in-place most likely, so there would
be no reads from the impacted block until it was rewritten which
presumably would clear the integrity error.

That said, I'm not sure that lizardfs even overwrites anything
in-place in normal use so it might not make any difference vs zfs.  It
breaks all data into "chunks" and I'd think that if data were
overwritten in place at the filesystem level it probably would end up
in a new chunk, with the old one garbage collected if it were not
snapshotted.

-- 
Rich

Re: [gentoo-user] New system, systemd, and dm-integrity

Reply via email to