On 31.08.2023 08:45, Drew Gallatin wrote:
On Wed, Aug 30, 2023, at 8:01 PM, Alexander Motin wrote:
It is the first time I see a panic like this. I'll think about it
tomorrow. But I'd appreciate any information on what is your workload
and what are you doing related to ZIL (O_SYNC, fsync(), sync=always,
etc) to trigger it? What is your pool configuration?
I'm not Gleb, but this was something at $WORK, so I can perhaps help.
I've included the output of zpool status, and all non-default settings
in the zpool. Note that we don't use a ZIL device.
You don't use SLOG device. ZIL is always with you, just embedded in
this case.
I tried to think about this for couple hours and still can't see how can
this happen. zil_sync() should not call zil_free_lwb() unless the lwb
is in LWB_STATE_FLUSH_DONE. To get into LWB_STATE_FLUSH_DONE lwb should
first delete all lwb_vdev_tree entries in zil_lwb_write_done(). And no
new entries should be added during/after zil_lwb_write_done() due to set
zio dependencies.
I've made a patch tuning some assertions for this context:
https://github.com/openzfs/zfs/pull/15227 . If the issue is
reproducible, could you please apply it and try again? May be it give
us any more clues.
--
Alexander Motin