Hi,

While testing for device failure cases with failmode=wait I found the
following :
- The vdevs are using iscsi targets in the backend.
- induced IO failure by dropping all packets from the iscsi-target
- this resulted in the zpool seeing IO errors on the vdevs.
- the spa_sync() thread was stuck waiting for IOs to complete.

And after this fixed the iscsi path by restoring the packet flow.

And when I tried 'zpool clear' :
 - if (failmode=continue), it fails to clear the pool although the
underlying
   iSCSI device has recovered.
 - And if failmode=wait, the zfs_ioc_clear() although it has failed to
reopen
   the device, does not report an error.And hence, "zpool clear" wrongly
assumes
   that the zfs_ioc_clear() succeeded and issues zfs_ioc_log_history() which
   gets stuck in the kernel at dmu_tx_assign() waiting for spa_sync() to
complete.

I have updated my findings here :
https://github.com/zfsonlinux/zfs/issues/3256

Below is the stack where 'zpool clear' is stuck in the kernel :
-- snip --
PID: 21037 TASK: ffff8800ba96aa80 CPU: 2 COMMAND: "zpool"
#0 [ffff880220323b30] __schedule at ffffffff816cb514
#1 [ffff880220323be0] schedule at ffffffff816cbc10
#2 [ffff880220323c00] cv_wait_common at ffffffffa08d9275 [spl]
#3 [ffff880220323c80] __cv_wait at ffffffffa08d9305 [spl]
#4 [ffff880220323c90] txg_wait_synced at ffffffffa0999c99 [zfs]
#5 [ffff880220323ce0] dmu_tx_wait at ffffffffa095426e [zfs]
#6 [ffff880220323d50] dmu_tx_assign at ffffffffa095436a [zfs]
#7 [ffff880220323d80] spa_history_log_nvl at ffffffffa099106e [zfs]
#8 [ffff880220323dd0] spa_history_log at ffffffffa0991194 [zfs]
#9 [ffff880220323e00] zfs_ioc_log_history at ffffffffa09c380b [zfs]
#10 [ffff880220323e40] zfsdev_ioctl at ffffffffa09c4357 [zfs]
#11 [ffff880220323eb0] do_vfs_ioctl at ffffffff81216072
#12 [ffff880220323f00] sys_ioctl at ffffffff81216402
#13 [ffff880220323f50] entry_SYSCALL_64_fastpath at ffffffff816cf76e
RIP: 00007f17bcc9ba77 RSP: 00007ffd708a7f98 RFLAGS: 00000246
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f17bcc9ba77
RDX: 00007ffd708a7fa0 RSI: 0000000000005a3f RDI: 0000000000000003
RBP: 0000000001157060 R8: 0000000001157f20 R9: 616332666162362d
R10: 0000000039373030 R11: 0000000000000246 R12: 000000000061fce0
R13: 000000000061e980 R14: 000000000061ead0 R15: 000000000000000e
ORIG_RAX: 0000000000000010 CS: 0033 SS: 002b
-- snip --

Upon further analysis found that the vdev_reopen() was failing because, the
IOs
issued by vdev_probe() were failing. vdev_accessible() called by
zio_vdev_io_start() was failing and this was because,
vdev->vdev_remove_wanted
was set on the vdev. This was set when the vdev originally faulted. And this
should have been cleared by spa_async_remove() which should have been
spawned by
spa_sync()->spa_async_dispatch(). However, the spa_sync() thread for this
pool
is stuck waiting for sync-IOs to complete.
-- snip --
PID: 24311 TASK: ffff8802a03d6a40 CPU: 2 COMMAND: "txg_sync"
#0 [ffff8802a03eb940] __schedule at ffffffff816cb514
#1 [ffff8802a03eb9f0] schedule at ffffffff816cbc10
#2 [ffff8802a03eba10] schedule_timeout at ffffffff816ce7ca
#3 [ffff8802a03ebad0] io_schedule_timeout at ffffffff816cb1f4
#4 [ffff8802a03ebb10] cv_wait_common at ffffffffa08d6072 [spl]
#5 [ffff8802a03ebb90] __cv_wait_io at ffffffffa08d6228 [spl]
#6 [ffff8802a03ebba0] zio_wait at ffffffffa0a43bdd [zfs]
#7 [ffff8802a03ebbf0] dsl_pool_sync_mos at ffffffffa09993e7 [zfs]
#8 [ffff8802a03ebc20] dsl_pool_sync at ffffffffa099a1e2 [zfs]
#9 [ffff8802a03ebcb0] spa_sync at ffffffffa09be347 [zfs]
#10 [ffff8802a03ebd60] txg_sync_thread at ffffffffa09d8681 [zfs]
#11 [ffff8802a03ebe80] thread_generic_wrapper at ffffffffa08cfb11 [spl]
#12 [ffff8802a03ebec0] kthread at ffffffff810a263c
#13 [ffff8802a03ebf50] ret_from_fork at ffffffff816cfacf
-- snip --

So, that is sort of a deadlock because, the zpool cannot be cleared unless
the
"vdev_remove_wanted" is handled and spa_sync() which is supposed to clear
it, won't
get there unless the IOs complete.

Below is the fix that we have been running with and seems to address the
issue.
Wondering if I should file a seperate bug for this issue or use the above
bug (3256).

Kindly suggest how I should proceed.

Thanks and regards,
Sanjeev
PS : The fix :
-- snip --
diff --git a/module/zfs/vdev.c b/module/zfs/vdev.c
index e741a69..2b9665a 100644
--- a/module/zfs/vdev.c
+++ b/module/zfs/vdev.c
@@ -2693,6 +2693,16 @@ vdev_clear(spa_t *spa, vdev_t *vd)
        vd->vdev_stat.vs_write_errors = 0;
        vd->vdev_stat.vs_checksum_errors = 0;

+       /*
+        * Clear the vdev_remove_wanted. The vdev_clear() reopens the
device,
+        * which issues vdev_probe IO
(vdev_reopen()->vdev_open()->vdev_probe())
+        * and that would fail (vdev_accessible()) if vdev_remove_wanted is
set.
+        * Hence, we clear it here to allow the probe IOs to continue. And
the
+        * fact that we are reopening ensures that the devices is back
+        * available.
+        */
+       vd->vdev_remove_wanted = B_FALSE;
+
        for (c = 0; c < vd->vdev_children; c++)
                vdev_clear(spa, vd->vdev_child[c]);

diff --git a/module/zfs/zfs_ioctl.c b/module/zfs/zfs_ioctl.c
index 8aa6923..318c82e 100644
--- a/module/zfs/zfs_ioctl.c
+++ b/module/zfs/zfs_ioctl.c
@@ -4798,6 +4798,23 @@ zfs_ioc_clear(zfs_cmd_t *zc)
        if (zio_resume(spa) != 0)
                error = SET_ERROR(EIO);

+       /*
+        * For zpools with failmode=wait, the zio_resume()->zio_wait() may
+        * not return the actual status of the IO, as the failed IO would be
+        * rescheduled and no failure is returned. Hence, return the status
of
+        * the vdev on which the clear was issued. Else, the userland
command
+        * would wrongly assume that the ioctl was successful and try to
log the
+        * operation into zpool-history and that would block.
+        */
+       if (error == 0 && spa->spa_failmode == ZIO_FAILURE_MODE_WAIT) {
+               if (vd == NULL) {
+                       vd = spa->spa_root_vdev;
+               }
+               if (vdev_is_dead(vd)) {
+                       error = SET_ERROR(EIO);
+               }
+       }
+
        spa_close(spa, FTAG);

        return (error);
-- snip --



-------------------------------------------
openzfs-developer
Archives: https://www.listbox.com/member/archive/274414/=now
RSS Feed: https://www.listbox.com/member/archive/rss/274414/28015062-cce53afa
Modify Your Subscription: 
https://www.listbox.com/member/?member_id=28015062&id_secret=28015062-f966d51c
Powered by Listbox: http://www.listbox.com

Reply via email to