Hi all, I've come to the conclusion that I should actively recommend people not use ZFS native encryption for the moment, because "NULL dereference in the ARC sometimes if you receive a send -w" is not really an acceptable risk, to me, and I've not been making great progress in debugging it. If anyone could take a look and see if I'm overlooking something obvious, that'd be great.
https://github.com/openzfs/zfs/issues/11679 is the issue in question, though there's a bunch of child issues I know or suspect will go away at the same time. tl;dr It's possible in some edge case to get to arc_buf_untransform_in_place(buf, hash_lock) in arc_buf_fill with HDR_EMPTY(hdr) == 1, evidently only when encryption is involved, and then we try to copy NULL. * I think it's under memory pressure, since all the times I've reproduced it have seemed to be when the system was under significant memory pressure, but who knows, at this point. * Happens on 0.8.x up to git tip from a week or two ago, at least in my experiments. * Just erroring out on detecting that condition made one testbed running --enable-debug panic later on detecting an arc_cksum_verify violation, but has not yet done so on other testbeds, so...unclear, but something seems very rotten. * It takes me days of running things in a loop to trigger it failing again, so someone who knows this code well enough would be a significant aid. * I'm not actually personally hitting this outside of trying to reproduce other people's issues, but additional people have been trickling into the issue tracker in either that bug or new bugs about this. I've got a load of debugging information and rabbit holes gone down I can share if anyone is interested. Thanks! - Rich ------------------------------------------ openzfs: openzfs-developer Permalink: https://openzfs.topicbox.com/groups/developer/T86620b1082ed947a-M1791d2f189849f68a249e126 Delivery options: https://openzfs.topicbox.com/groups/developer/subscription