----- Original Message ----- From: "Steven Hartland" <[email protected]>
----- Original Message ----- From: "Steven Hartland"I've been working on adding IO priority support for TRIM back into FreeBSD after the import of the new IO scheduling from illumos. Based on avg's initial work and having got my head around the requirements of the new scheduler I came up with the attached zz-zfs-trim-priority.patch. Most of the time this worked fine but as soon as bio_delete requests where disabled using the follow I started getting panics: sysctl vfs.zfs.vdev.bio_delete_disable=1 A simple dd is enough to trigger the panic e.g. dd if=/dev/zero of=/data/random.dd bs=1m count=10240 The wide selection of panics all seemed to indicate queue corruption with the main one erroring in vdev_queue_io_to_issue on the line: zio = avl_first(&vqc->vqc_queued_tree); After a day of debugging and adding lots of additional validation checks it became apparent that after removing a zio from vq_active_tree both vq_active_tree and the associated vqc_queued_tree become corrupt. By corrupt I mean that avl_numnodes is no longer in sync with a manual count of the nodes using a tree walk. In each case the vq_active_tree.avl_numnodes is one less than its actual number of nodes and vqc_queued_tree.avl_numnodes is one greater than its actual number of nodes. After adding queue tracking to zio's it turned out that vdev_queue_pending_remove was trying to remove a zio from vq_active_tree which wasn't in that tree but was in write vqc_queued_tree tree. As avl_remove doesn't do any validation on the node being present in the tree it merrily tried to remove it resulting in nasty ness in both trees. The cause of this is in zio_vdev_io_start specifically if ((zio = vdev_queue_io(zio)) == NULL) return (ZIO_PIPELINE_STOP); This can result in a different zio reaching: return (vd->vdev_ops->vdev_op_io_start(zio)); When this happens and vdev_op_io_start returns ZIO_PIPELINE_CONTINUE e.g. TRIM requests when bio_delete_disable=1 is set, the calling zio_execute continues the pipeline for the zio it called zio_vdev_io_start with, but that zio hasn't been processed and hence isn't in the vq_active_tree but in one of vqc_queued_tree's. Its not clear if any other paths can have their vdev_io_io_start return ZIO_PIPELINE_CONTINUE but it definitely looks that way and may well explain other panics I've seen in this area when for example disks dropped. I'm unsure if there's a more elegent fix but allowing pipeline stages to change the processing zio by passing in a zio_t **ziop instead of zio_t *zio as in the attached zfs-zio-queue-reorder.patch fixes the issue. Note: Patches are based on FreeBSD 10-RELEASE + some backports from 10-STABLE, mainly r260763: 4045 zfs write throttle & i/o scheduler, so should apply to 10-STABLE and 11-CURRENT.Given this could possibly lead to data loss and corruption, although it looks like only in the case of prior IO errors, I've committed the patch for this to the FreeBSD current and stable trees. If a different fix is agreed upon then we rework to match.
We've also come across another issue in the same area which causes a stack overflow due to the following recursion: zio_execute -> zio_vdev_io_done -> vdev_queue_io_done -> zio_execute This occurs for IO's which return ZIO_PIPELINE_CONTINUE from the zio_vdev_io_start stage and hence don't suspend and complete in a different thread. I've attached the patch which I've committed to FreeBSD's current branch to prevent this issue which was triggering double fault panic in combination with queued FREE IO's on volumes which return EOPNOTSUPP and hence return ZIO_PIPELINE_CONTINUE. It is however likely this issue could be triggered by other paths for example a failing disk. Regards Steve
zz-zfs-io-queue-recursion.patch
Description: Binary data
_______________________________________________ developer mailing list [email protected] http://lists.open-zfs.org/mailman/listinfo/developer
