Using this change with debug bits, when running a mixed read/write `fio` 
workload, I seem to have hit a hang. I see stacks like this:
```
ffffff2e7fd03800 SLEEP    CV                    128
                 swtch+0x18a
                 cv_wait+0x89
                 txg_wait_open+0xcb
                 dmu_tx_wait+0x1d8
                 dmu_tx_assign+0x8b
                 zfs_write+0x561
                 fop_write+0x5b
                 pwrite+0x193
```
and then the sync thread shows this:
```
ffffff001fcc2c40 SLEEP    CV                      1
                 swtch+0x18a
                 cv_wait+0x89
                 zio_wait+0xbb
                 dsl_pool_sync_mos+0x4a
                 dsl_pool_sync+0x3ab
                 spa_sync+0x45e
                 txg_sync_thread+0x260
                 thread_start+8
```
and when I look at the zio state, I see:
```
> ::zio_state
ADDRESS                 TYPE  STAGE            WAITER           TIME_ELAPSED
ffffff229c838bd8        NULL  CHECKSUM_VERIFY  ffffff0729121840 -
ffffff070f868140        NULL  CHECKSUM_VERIFY  ffffff076fc14b40 -
mdb: unexpected value 3735928559 of enum type zio_type_t (member io_type of 
type struct zio)

> ::zio_state -r
mdb: unexpected value 3735928559 of enum type zio_type_t (member io_type of 
type struct zio)

> ::zio_state -r
ADDRESS                 TYPE  STAGE            WAITER           TIME_ELAPSED
ffffff26da80a168        NULL  CHECKSUM_VERIFY  ffffff07a5155460 -
 ffffff641bc44298       READ  VDEV_IO_START    -                -
  ffffff26db21ee68      READ  VDEV_IO_START    -                43ms
mdb: invalid or uninitialized list_t at 0xffffff26db21efa8
mdb: failed to walk zio_t children at ffffff26db21efa8
ffffff229a00ca78        NULL  DONE             ffffff0722689860 -
mdb: unexpected value 3735928559 of enum type zio_type_t (member io_type of 
type struct zio)

> ::zio_state -r
ADDRESS                 TYPE  STAGE            WAITER           TIME_ELAPSED
ffffff070f82d150        NULL  CHECKSUM_VERIFY  ffffff0729121480 -
 ffffff229c88a0f0       READ  VDEV_IO_START    -                -
  ffffff229c8b51f8      READ  VDEV_IO_START    -                7ms
ffffff213a0454a8        NULL  CHECKSUM_VERIFY  ffffff076fc0e7c0 -
 ffffff26da7dfba0       READ  VDEV_IO_START    -                -
  ffffff1c1db44b80      READ  VDEV_IO_START    -                136ms
ffffff229c84c8f0        NULL  CHECKSUM_VERIFY  ffffff076fc20020 -
 ffffff26da8145a0       READ  VDEV_IO_START    -                -
  ffffff076281ac58      READ  VDEV_IO_START    -                143ms
mdb: unexpected value 3735928559 of enum type zio_type_t (member io_type of 
type struct zio)
```

It looks like reads are still being processed from `fio` but not writes (likely 
due to the stuck txg sync thread). The interesting part is the `mdb:` error 
lines printed from `::zio_state -r`.. I'm not quite sure what that means yet, 
but it's likely due to some sort of corruption of the in RAM state/structures.

I'll leave this VM alone, and see if I can bug @ahrens or @grwilson to help me 
understand this better next week. 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/openzfs/openzfs/pull/489#issuecomment-364672027
------------------------------------------
openzfs-developer
Archives: 
https://openzfs.topicbox.com/groups/developer/discussions/T91797982fdd5b7d9-Mc8b1a8f8eb0aee713e2088ef
Powered by Topicbox: https://topicbox.com

Reply via email to