On Fri, May 15, 2020 at 9:18 AM antlists <[email protected]> wrote: > > On 15/05/2020 12:30, Rich Freeman wrote: > > The actual problem that this module solves is no-doubt long solved > > upstream, but here is the blog post on dracut modules (which is fairly > > well-documented in the official docs as well): > > https://rich0gentoo.wordpress.com/2012/01/21/a-quick-dracut-module/ > > I don't think it is ... certainly I'm not aware of anything other than > LUKS that uses dm-integrity, and LUKS sets it up itself.
I was referring to my specific problem in the blog article with mdadm not actually detecting drives for whatever reason. I no longer use md-raid on any of my systems so I can't vouch for whether it is still an issue, but something like that was probably fixed somewhere. > > If your module is reasonably generic you could probably get upstream > > to merge it as well. > > No. Like LUKS, I intend to merge the code into mdadm and let the raid > side handle it. If mdadm detects a dm-integrity/raid setup, it'll set up > dm-integrity and then recurse to set up raid. Seems reasonable enough, though you could probably argue for separation of concerns to do it in dracut. In any case, I do suspect the dracut folks would consider such a use case valid for inclusion in the default package if you do want to have a module for it. > openSUSE is my only experience of btrfs. And it hasn't been nice. When > it goes wrong it's nasty. Plus only raid 1 really works - I've heard > that 5 and 6 have design flaws which means it will be very hard to get > them to work properly. Yeah, I moved away from btrfs as well for the same reasons. I got into it years ago thinking that it was still a bit unpolished but seemed to be rapidly gaining traction. For whatever reason they never got regressions under control and I got burned more than once by it. I did keep backups but restoration is of course painful. > I've never met zfs. So, compared to what you're doing I could see the following advantages: 1. All the filesystem-layer stuff which obviously isn't in-scope for the lower layers, including snapshots (obviously those can be done with lvm but it is a bit cleaner at the filesystem level). I'd argue that some of this stuff isn't as flexible as with btrfs but it will be far superior to something like ext4 on top of what you're doing. 2. No RAID write-hole. I'd think that your solution with the integrity layer would detect corruption resulting from the write hole, but I don't think it could prevent it, since a RAID stripe is still overwritten in place. But, I've never had a conversation with an md-raid developer so perhaps you have a more educated view on the matter. 3. COW offers some of the data-integrity benefits of full data journaling without the performance costs of this. On the other hand it probably is not going to perform as well as overwriting in place without any data journaling. In theory this is more of a filesystem-level feature though. 4. In the future COW with zfs could probably enable better performance on SSD/SMR with TRIM by structuring writes to consolidate free blocks into erase zones. However, as far as I'm aware that is a theoretical future benefit and not anything available, and I have no idea if anybody is working on that. This sort of benefit would require the vertical integration that zfs uses. In general zfs is much more stable than btrfs and far less likely to eat your data. And FWIW I did once (many years ago) have ext4+lvm+mdadm eat my data - I think it was due to some kind of lvm metadata corruption or something like that, because basically an fsck on one ext4 partition scrambled a different ext4 partition, which obviously should not be possible if lvm is working right. I have no idea what the root cause of that was - could have been bad RAM or something which of course can mess up anything short of a distributed filesystem with integrity checking above the host level (which, IMO, most of the solutions don't do as well as they could). One big disadvantage with zfs is that it is far less flexible at the physical layer. You can add the equivalent of LVM PVs, and you can expand a PV, but you can't remove a PV in anything but the latest version of zfs, and I think there are some limitations around how this works. You can't reshape the equivalent of an mdadm array, but you can replace a drive in an array and grow an array if all the underlying devices have enough space. You can add/remove mirrors from the equivalent of a raid1 to freely go between no-redundancy to any multiplicity you wish. Striped arrays are basically fixed in layout once created. > As the linux raid wiki says (I wrote it :-) do you want the complexity > of a "do it all" filesystem, or the abstraction of dedicated layers? Yeah, it is a well-established argument and has some merit. I'm not sure I'd go this route for my regular hosts since zfs works reasonably well (though your solution is more flexible than zfs). However, I might evaluate how dm-integrity plus ext4 (maybe with LVM in-between) works on my lizardfs chunkservers. These have redundancy above the host level, but I do want integrity checking for static data issues, and I'm not sure that lizardfs provides any guarantees here (plus having it at the host level would probably perform better anyway). If the integrity layer returned an io error lizardfs would just overwrite the impacted files in-place most likely, so there would be no reads from the impacted block until it was rewritten which presumably would clear the integrity error. That said, I'm not sure that lizardfs even overwrites anything in-place in normal use so it might not make any difference vs zfs. It breaks all data into "chunks" and I'd think that if data were overwritten in place at the filesystem level it probably would end up in a new chunk, with the old one garbage collected if it were not snapshotted. -- Rich

