Hi, While testing for device failure cases with failmode=wait I found the following : - The vdevs are using iscsi targets in the backend. - induced IO failure by dropping all packets from the iscsi-target - this resulted in the zpool seeing IO errors on the vdevs. - the spa_sync() thread was stuck waiting for IOs to complete.
And after this fixed the iscsi path by restoring the packet flow. And when I tried 'zpool clear' : - if (failmode=continue), it fails to clear the pool although the underlying iSCSI device has recovered. - And if failmode=wait, the zfs_ioc_clear() although it has failed to reopen the device, does not report an error.And hence, "zpool clear" wrongly assumes that the zfs_ioc_clear() succeeded and issues zfs_ioc_log_history() which gets stuck in the kernel at dmu_tx_assign() waiting for spa_sync() to complete. I have updated my findings here : https://github.com/zfsonlinux/zfs/issues/3256 Below is the stack where 'zpool clear' is stuck in the kernel : -- snip -- PID: 21037 TASK: ffff8800ba96aa80 CPU: 2 COMMAND: "zpool" #0 [ffff880220323b30] __schedule at ffffffff816cb514 #1 [ffff880220323be0] schedule at ffffffff816cbc10 #2 [ffff880220323c00] cv_wait_common at ffffffffa08d9275 [spl] #3 [ffff880220323c80] __cv_wait at ffffffffa08d9305 [spl] #4 [ffff880220323c90] txg_wait_synced at ffffffffa0999c99 [zfs] #5 [ffff880220323ce0] dmu_tx_wait at ffffffffa095426e [zfs] #6 [ffff880220323d50] dmu_tx_assign at ffffffffa095436a [zfs] #7 [ffff880220323d80] spa_history_log_nvl at ffffffffa099106e [zfs] #8 [ffff880220323dd0] spa_history_log at ffffffffa0991194 [zfs] #9 [ffff880220323e00] zfs_ioc_log_history at ffffffffa09c380b [zfs] #10 [ffff880220323e40] zfsdev_ioctl at ffffffffa09c4357 [zfs] #11 [ffff880220323eb0] do_vfs_ioctl at ffffffff81216072 #12 [ffff880220323f00] sys_ioctl at ffffffff81216402 #13 [ffff880220323f50] entry_SYSCALL_64_fastpath at ffffffff816cf76e RIP: 00007f17bcc9ba77 RSP: 00007ffd708a7f98 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f17bcc9ba77 RDX: 00007ffd708a7fa0 RSI: 0000000000005a3f RDI: 0000000000000003 RBP: 0000000001157060 R8: 0000000001157f20 R9: 616332666162362d R10: 0000000039373030 R11: 0000000000000246 R12: 000000000061fce0 R13: 000000000061e980 R14: 000000000061ead0 R15: 000000000000000e ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b -- snip -- Upon further analysis found that the vdev_reopen() was failing because, the IOs issued by vdev_probe() were failing. vdev_accessible() called by zio_vdev_io_start() was failing and this was because, vdev->vdev_remove_wanted was set on the vdev. This was set when the vdev originally faulted. And this should have been cleared by spa_async_remove() which should have been spawned by spa_sync()->spa_async_dispatch(). However, the spa_sync() thread for this pool is stuck waiting for sync-IOs to complete. -- snip -- PID: 24311 TASK: ffff8802a03d6a40 CPU: 2 COMMAND: "txg_sync" #0 [ffff8802a03eb940] __schedule at ffffffff816cb514 #1 [ffff8802a03eb9f0] schedule at ffffffff816cbc10 #2 [ffff8802a03eba10] schedule_timeout at ffffffff816ce7ca #3 [ffff8802a03ebad0] io_schedule_timeout at ffffffff816cb1f4 #4 [ffff8802a03ebb10] cv_wait_common at ffffffffa08d6072 [spl] #5 [ffff8802a03ebb90] __cv_wait_io at ffffffffa08d6228 [spl] #6 [ffff8802a03ebba0] zio_wait at ffffffffa0a43bdd [zfs] #7 [ffff8802a03ebbf0] dsl_pool_sync_mos at ffffffffa09993e7 [zfs] #8 [ffff8802a03ebc20] dsl_pool_sync at ffffffffa099a1e2 [zfs] #9 [ffff8802a03ebcb0] spa_sync at ffffffffa09be347 [zfs] #10 [ffff8802a03ebd60] txg_sync_thread at ffffffffa09d8681 [zfs] #11 [ffff8802a03ebe80] thread_generic_wrapper at ffffffffa08cfb11 [spl] #12 [ffff8802a03ebec0] kthread at ffffffff810a263c #13 [ffff8802a03ebf50] ret_from_fork at ffffffff816cfacf -- snip -- So, that is sort of a deadlock because, the zpool cannot be cleared unless the "vdev_remove_wanted" is handled and spa_sync() which is supposed to clear it, won't get there unless the IOs complete. Below is the fix that we have been running with and seems to address the issue. Wondering if I should file a seperate bug for this issue or use the above bug (3256). Kindly suggest how I should proceed. Thanks and regards, Sanjeev PS : The fix : -- snip -- diff --git a/module/zfs/vdev.c b/module/zfs/vdev.c index e741a69..2b9665a 100644 --- a/module/zfs/vdev.c +++ b/module/zfs/vdev.c @@ -2693,6 +2693,16 @@ vdev_clear(spa_t *spa, vdev_t *vd) vd->vdev_stat.vs_write_errors = 0; vd->vdev_stat.vs_checksum_errors = 0; + /* + * Clear the vdev_remove_wanted. The vdev_clear() reopens the device, + * which issues vdev_probe IO (vdev_reopen()->vdev_open()->vdev_probe()) + * and that would fail (vdev_accessible()) if vdev_remove_wanted is set. + * Hence, we clear it here to allow the probe IOs to continue. And the + * fact that we are reopening ensures that the devices is back + * available. + */ + vd->vdev_remove_wanted = B_FALSE; + for (c = 0; c < vd->vdev_children; c++) vdev_clear(spa, vd->vdev_child[c]); diff --git a/module/zfs/zfs_ioctl.c b/module/zfs/zfs_ioctl.c index 8aa6923..318c82e 100644 --- a/module/zfs/zfs_ioctl.c +++ b/module/zfs/zfs_ioctl.c @@ -4798,6 +4798,23 @@ zfs_ioc_clear(zfs_cmd_t *zc) if (zio_resume(spa) != 0) error = SET_ERROR(EIO); + /* + * For zpools with failmode=wait, the zio_resume()->zio_wait() may + * not return the actual status of the IO, as the failed IO would be + * rescheduled and no failure is returned. Hence, return the status of + * the vdev on which the clear was issued. Else, the userland command + * would wrongly assume that the ioctl was successful and try to log the + * operation into zpool-history and that would block. + */ + if (error == 0 && spa->spa_failmode == ZIO_FAILURE_MODE_WAIT) { + if (vd == NULL) { + vd = spa->spa_root_vdev; + } + if (vdev_is_dead(vd)) { + error = SET_ERROR(EIO); + } + } + spa_close(spa, FTAG); return (error); -- snip -- ------------------------------------------- openzfs-developer Archives: https://www.listbox.com/member/archive/274414/=now RSS Feed: https://www.listbox.com/member/archive/rss/274414/28015062-cce53afa Modify Your Subscription: https://www.listbox.com/member/?member_id=28015062&id_secret=28015062-f966d51c Powered by Listbox: http://www.listbox.com
