On 15/05/20 14:49, Rich Freeman wrote: > On Fri, May 15, 2020 at 9:18 AM antlists <[email protected]> wrote: >> >> On 15/05/2020 12:30, Rich Freeman wrote:
I've snipped it, but I can't imagine dracut/mdadm having the problems you describe today - there are too many systems out there that boot from lvm/mdadm. My problem is I'm adding dm-integrity to the mix ... > > So, compared to what you're doing I could see the following advantages: > > 1. All the filesystem-layer stuff which obviously isn't in-scope for > the lower layers, including snapshots (obviously those can be done > with lvm but it is a bit cleaner at the filesystem level). I'd argue > that some of this stuff isn't as flexible as with btrfs but it will be > far superior to something like ext4 on top of what you're doing. > > 2. No RAID write-hole. I'd think that your solution with the > integrity layer would detect corruption resulting from the write hole, > but I don't think it could prevent it, since a RAID stripe is still > overwritten in place. But, I've never had a conversation with an > md-raid developer so perhaps you have a more educated view on the > matter. I don't know as it would. The write hole is where all the blocks are intact, but not all of them make it to disk. That said, the write hole has been pretty much fixed now - I think new raids use journalling which deals with it. That's certainly been discussed on the list. > > 3. COW offers some of the data-integrity benefits of full data > journaling without the performance costs of this. On the other hand > it probably is not going to perform as well as overwriting in place > without any data journaling. In theory this is more of a > filesystem-level feature though. > > 4. In the future COW with zfs could probably enable better > performance on SSD/SMR with TRIM by structuring writes to consolidate > free blocks into erase zones. However, as far as I'm aware that is a > theoretical future benefit and not anything available, and I have no > idea if anybody is working on that. This sort of benefit would > require the vertical integration that zfs uses. > > In general zfs is much more stable than btrfs and far less likely to > eat your data. And FWIW I did once (many years ago) have > ext4+lvm+mdadm eat my data - I think it was due to some kind of lvm > metadata corruption or something like that, because basically an fsck > on one ext4 partition scrambled a different ext4 partition, which > obviously should not be possible if lvm is working right. I have no > idea what the root cause of that was - could have been bad RAM or > something which of course can mess up anything short of a distributed > filesystem with integrity checking above the host level (which, IMO, > most of the solutions don't do as well as they could). > > One big disadvantage with zfs is that it is far less flexible at the > physical layer. You can add the equivalent of LVM PVs, and you can > expand a PV, but you can't remove a PV in anything but the latest > version of zfs, and I think there are some limitations around how this > works. You can't reshape the equivalent of an mdadm array, but you > can replace a drive in an array and grow an array if all the > underlying devices have enough space. You can add/remove mirrors from > the equivalent of a raid1 to freely go between no-redundancy to any > multiplicity you wish. Striped arrays are basically fixed in layout > once created. > >> As the linux raid wiki says (I wrote it :-) do you want the complexity >> of a "do it all" filesystem, or the abstraction of dedicated layers? > > Yeah, it is a well-established argument and has some merit. > > I'm not sure I'd go this route for my regular hosts since zfs works > reasonably well (though your solution is more flexible than zfs). > > However, I might evaluate how dm-integrity plus ext4 (maybe with LVM > in-between) works on my lizardfs chunkservers. These have redundancy > above the host level, but I do want integrity checking for static data > issues, and I'm not sure that lizardfs provides any guarantees here > (plus having it at the host level would probably perform better > anyway). If the integrity layer returned an io error lizardfs would > just overwrite the impacted files in-place most likely, so there would > be no reads from the impacted block until it was rewritten which > presumably would clear the integrity error. > > That said, I'm not sure that lizardfs even overwrites anything > in-place in normal use so it might not make any difference vs zfs. It > breaks all data into "chunks" and I'd think that if data were > overwritten in place at the filesystem level it probably would end up > in a new chunk, with the old one garbage collected if it were not > snapshotted. > The crucial point here is that dm-integrity protects against something *outside* your stack trashing part of the disk. If something came along and wrote randomly to /dev/sda, then when my filesystem tried to retrieve a file, dm-integrity would cause sda to return a read error, raid would say "oops", read it from sdb, and rewrite sda. It won't protect against corruption in the stack itself, I don't think, because if the data is corrupt when it hits dm-integrity's write path, of course all the crc's etc will be correct. Anyways, for a bit of info and a cookbook on dm-integrity, take a look at https://raid.wiki.kernel.org/index.php/System2020 Cheers, Wol

