Using this change with debug bits, when running a mixed read/write `fio`
workload, I seem to have hit a hang. I see stacks like this:
```
ffffff2e7fd03800 SLEEP CV 128
swtch+0x18a
cv_wait+0x89
txg_wait_open+0xcb
dmu_tx_wait+0x1d8
dmu_tx_assign+0x8b
zfs_write+0x561
fop_write+0x5b
pwrite+0x193
```
and then the sync thread shows this:
```
ffffff001fcc2c40 SLEEP CV 1
swtch+0x18a
cv_wait+0x89
zio_wait+0xbb
dsl_pool_sync_mos+0x4a
dsl_pool_sync+0x3ab
spa_sync+0x45e
txg_sync_thread+0x260
thread_start+8
```
and when I look at the zio state, I see:
```
> ::zio_state
ADDRESS TYPE STAGE WAITER TIME_ELAPSED
ffffff229c838bd8 NULL CHECKSUM_VERIFY ffffff0729121840 -
ffffff070f868140 NULL CHECKSUM_VERIFY ffffff076fc14b40 -
mdb: unexpected value 3735928559 of enum type zio_type_t (member io_type of
type struct zio)
> ::zio_state -r
mdb: unexpected value 3735928559 of enum type zio_type_t (member io_type of
type struct zio)
> ::zio_state -r
ADDRESS TYPE STAGE WAITER TIME_ELAPSED
ffffff26da80a168 NULL CHECKSUM_VERIFY ffffff07a5155460 -
ffffff641bc44298 READ VDEV_IO_START - -
ffffff26db21ee68 READ VDEV_IO_START - 43ms
mdb: invalid or uninitialized list_t at 0xffffff26db21efa8
mdb: failed to walk zio_t children at ffffff26db21efa8
ffffff229a00ca78 NULL DONE ffffff0722689860 -
mdb: unexpected value 3735928559 of enum type zio_type_t (member io_type of
type struct zio)
> ::zio_state -r
ADDRESS TYPE STAGE WAITER TIME_ELAPSED
ffffff070f82d150 NULL CHECKSUM_VERIFY ffffff0729121480 -
ffffff229c88a0f0 READ VDEV_IO_START - -
ffffff229c8b51f8 READ VDEV_IO_START - 7ms
ffffff213a0454a8 NULL CHECKSUM_VERIFY ffffff076fc0e7c0 -
ffffff26da7dfba0 READ VDEV_IO_START - -
ffffff1c1db44b80 READ VDEV_IO_START - 136ms
ffffff229c84c8f0 NULL CHECKSUM_VERIFY ffffff076fc20020 -
ffffff26da8145a0 READ VDEV_IO_START - -
ffffff076281ac58 READ VDEV_IO_START - 143ms
mdb: unexpected value 3735928559 of enum type zio_type_t (member io_type of
type struct zio)
```
It looks like reads are still being processed from `fio` but not writes (likely
due to the stuck txg sync thread). The interesting part is the `mdb:` error
lines printed from `::zio_state -r`.. I'm not quite sure what that means yet,
but it's likely due to some sort of corruption of the in RAM state/structures.
I'll leave this VM alone, and see if I can bug @ahrens or @grwilson to help me
understand this better next week.
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/openzfs/openzfs/pull/489#issuecomment-364672027
------------------------------------------
openzfs-developer
Archives:
https://openzfs.topicbox.com/groups/developer/discussions/T91797982fdd5b7d9-Mc8b1a8f8eb0aee713e2088ef
Powered by Topicbox: https://topicbox.com