Hi all,
I've come to the conclusion that I should actively recommend people
not use ZFS native encryption for the moment, because "NULL
dereference in the ARC sometimes if you receive a send -w" is not
really an acceptable risk, to me, and I've not been making great
progress in debugging it.
If anyone could take a look and see if I'm overlooking something
obvious, that'd be great.

https://github.com/openzfs/zfs/issues/11679 is the issue in question,
though there's a bunch of child issues I know or suspect will go away
at the same time.

tl;dr It's possible in some edge case to get to
arc_buf_untransform_in_place(buf, hash_lock) in arc_buf_fill with
HDR_EMPTY(hdr) == 1, evidently only when encryption is involved, and
then we try to copy NULL.
* I think it's under memory pressure, since all the times I've
reproduced it have seemed to be when the system was under significant
memory pressure, but who knows, at this point.
* Happens on 0.8.x up to git tip from a week or two ago, at least in
my experiments.
* Just erroring out on detecting that condition made one testbed
running --enable-debug panic later on detecting an arc_cksum_verify
violation, but has not yet done so on other testbeds, so...unclear,
but something seems very rotten.
* It takes me days of running things in a loop to trigger it failing
again, so someone who knows this code well enough would be a
significant aid.
* I'm not actually personally hitting this outside of trying to
reproduce other people's issues, but additional people have been
trickling into the issue tracker in either that bug or new bugs about
this.

I've got a load of debugging information and rabbit holes gone down I
can share if anyone is interested.

Thanks!
- Rich

------------------------------------------
openzfs: openzfs-developer
Permalink: 
https://openzfs.topicbox.com/groups/developer/T86620b1082ed947a-M1791d2f189849f68a249e126
Delivery options: https://openzfs.topicbox.com/groups/developer/subscription

Reply via email to