Re: [OpenZFS Developer] zfs zio reordering in zio_vdev_io_start causing panics

Steven Hartland Sun, 04 May 2014 07:18:40 -0700

----- Original Message ----- From: "Steven Hartland" <[email protected]>

----- Original Message ----- From: "Steven Hartland"

I've been working on adding IO priority support for TRIM back into
FreeBSD after the import of the new IO scheduling from illumos.

Based on avg's initial work and having got my head around the
requirements of the new scheduler I came up with the attached
zz-zfs-trim-priority.patch.

Most of the time this worked fine but as soon as bio_delete requests
where disabled using the follow I started getting panics:
sysctl vfs.zfs.vdev.bio_delete_disable=1

A simple dd is enough to trigger the panic e.g.
dd if=/dev/zero of=/data/random.dd bs=1m count=10240

The wide selection of panics all seemed to indicate queue corruption
with the main one erroring in vdev_queue_io_to_issue on the line:
zio = avl_first(&vqc->vqc_queued_tree);

After a day of debugging and adding lots of additional validation
checks it became apparent that after removing a zio from vq_active_tree
both vq_active_tree and the associated vqc_queued_tree become corrupt.

By corrupt I mean that avl_numnodes is no longer in sync with a manual
count of the nodes using a tree walk.

In each case the vq_active_tree.avl_numnodes is one less than its actual
number of nodes and vqc_queued_tree.avl_numnodes is one greater than
its actual number of nodes.

After adding queue tracking to zio's it turned out that
vdev_queue_pending_remove was trying to remove a zio from vq_active_tree
which wasn't in that tree but was in write vqc_queued_tree tree.

As avl_remove doesn't do any validation on the node being present in
the tree it merrily tried to remove it resulting in nasty ness in both
trees.

The cause of this is in zio_vdev_io_start specifically
if ((zio = vdev_queue_io(zio)) == NULL)
    return (ZIO_PIPELINE_STOP);

This can result in a different zio reaching:
return (vd->vdev_ops->vdev_op_io_start(zio));

When this happens and vdev_op_io_start returns ZIO_PIPELINE_CONTINUE
e.g. TRIM requests when bio_delete_disable=1 is set, the calling
zio_execute continues the pipeline for the zio it called
zio_vdev_io_start with, but that zio hasn't been processed and hence
isn't in the vq_active_tree but in one of vqc_queued_tree's.

Its not clear if any other paths can have their vdev_io_io_start
return ZIO_PIPELINE_CONTINUE but it definitely looks that way and
may well explain other panics I've seen in this area when for example
disks dropped.

I'm unsure if there's a more elegent fix but allowing pipeline stages
to change the processing zio by passing in a zio_t **ziop instead
of zio_t *zio as in the attached zfs-zio-queue-reorder.patch fixes the
issue.

Note: Patches are based on FreeBSD 10-RELEASE + some backports from
10-STABLE, mainly r260763: 4045 zfs write throttle & i/o scheduler,
so should apply to 10-STABLE and 11-CURRENT.


Given this could possibly lead to data loss and corruption, although
it looks like only in the case of prior IO errors, I've committed
the patch for this to the FreeBSD current and stable trees.

If a different fix is agreed upon then we rework to match.


We've also come across another issue in the same area which causes a
stack overflow due to the following recursion:
zio_execute -> zio_vdev_io_done -> vdev_queue_io_done -> zio_execute

This occurs for IO's which return ZIO_PIPELINE_CONTINUE from the
zio_vdev_io_start stage and hence don't suspend and complete
in a different thread.

I've attached the patch which I've committed to FreeBSD's current
branch to prevent this issue which was triggering double fault panic
in combination with queued FREE IO's on volumes which return EOPNOTSUPP
and hence return ZIO_PIPELINE_CONTINUE.

It is however likely this issue could be triggered by other paths for
example a failing disk.

   Regards
   Steve

zz-zfs-io-queue-recursion.patch
Description: Binary data

_______________________________________________
developer mailing list
[email protected]
http://lists.open-zfs.org/mailman/listinfo/developer

Re: [OpenZFS Developer] zfs zio reordering in zio_vdev_io_start causing panics

Reply via email to