Fix looks good to me. We made this change internally at Delphix a few months back and we'll push it to illumos within the next few months. We are nearing the end of a release so we haven't had as much time to push stuff to illumos. If someone wants to push this before we get to it, please feel free.
(Chris Siden: Delphix bug is 28702.) Here is my analysis: assertion failed: zb->zb_object <= td->td_resume->zb_object (0xffffffffffffffff <= 0xfffffffffffffffe), file: ../../common/fs/zfs/dmu_traverse.c, line: 166 ffffff0008f32440 genunix:strlog+0 () ffffff0008f32490 zfs:resume_skip_check+d8 () ffffff0008f32570 zfs:traverse_visitbp+57 () ffffff0008f32600 zfs:traverse_dnode+8b () ffffff0008f326e0 zfs:traverse_visitbp+805 () ffffff0008f32820 zfs:traverse_impl+1bf () ffffff0008f32880 zfs:traverse_dataset_destroyed+49 () ffffff0008f32a00 zfs:bptree_iterate+1b9 () ffffff0008f32a70 zfs:dsl_scan_sync+401 () ffffff0008f32b50 zfs:spa_sync+344 () ffffff0008f32c20 zfs:txg_sync_thread+260 () ffffff0008f32c30 unix:thread_start+8 () The problem is that traverse_visitbp() visits the USERUSED and GROUPUSED objects in the wrong order; they should be reversed, because GROUPUSED < USERUSED. Typically there will be just one block in the GROUPUSED object, so it's pretty unlikely to pause on exactly this block. But I think you'd hit this every time we do. The problem can be reproduced by setting zfs_free_max_blocks to 1 and then destroying a filesystem. On Thu, Jan 16, 2014 at 4:36 AM, Andriy Gapon <[email protected]> wrote: > > How does the following change look to you? > If it's good, is anyone up to integrating it into illumos? > > Thanks! > > commit 380a6d6c42ff89f623d44fc78e67370012eb960e > Author: Andriy Gapon <[email protected]> > Date: Mon Oct 14 09:39:15 2013 +0300 > > traverse_visitbp: visit DMU_GROUPUSED_OBJECT before DMU_USERUSED_OBJECT > > This is done to ensure that visited object IDs are always increasing. > Also, pass correct object ID to prefetch_dnode_metadata for > os_groupused_dnode. > > diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_traverse.c > b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_traverse.c > index 1ff47c8..a0ca63f 100644 > --- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_traverse.c > +++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/dmu_traverse.c > @@ -329,9 +329,9 @@ traverse_visitbp(traverse_data_t *td, const > dnode_phys_t *dnp, > prefetch_dnode_metadata(td, dnp, zb->zb_objset, > DMU_META_DNODE_OBJECT); > if (arc_buf_size(buf) >= sizeof (objset_phys_t)) { > - prefetch_dnode_metadata(td, > &osp->os_userused_dnode, > - zb->zb_objset, DMU_USERUSED_OBJECT); > prefetch_dnode_metadata(td, > &osp->os_groupused_dnode, > + zb->zb_objset, DMU_GROUPUSED_OBJECT); > + prefetch_dnode_metadata(td, > &osp->os_userused_dnode, > zb->zb_objset, DMU_USERUSED_OBJECT); > } > > @@ -342,18 +342,18 @@ traverse_visitbp(traverse_data_t *td, const > dnode_phys_t *dnp, > err = 0; > } > if (err == 0 && arc_buf_size(buf) >= sizeof > (objset_phys_t)) { > - dnp = &osp->os_userused_dnode; > + dnp = &osp->os_groupused_dnode; > err = traverse_dnode(td, dnp, zb->zb_objset, > - DMU_USERUSED_OBJECT); > + DMU_GROUPUSED_OBJECT); > } > if (err && hard) { > lasterr = err; > err = 0; > } > if (err == 0 && arc_buf_size(buf) >= sizeof > (objset_phys_t)) { > - dnp = &osp->os_groupused_dnode; > + dnp = &osp->os_userused_dnode; > err = traverse_dnode(td, dnp, zb->zb_objset, > - DMU_GROUPUSED_OBJECT); > + DMU_USERUSED_OBJECT); > } > } > > > -- > Andriy Gapon > > _______________________________________________ > developer mailing list > [email protected] > http://lists.open-zfs.org/mailman/listinfo/developer > >
_______________________________________________ developer mailing list [email protected] http://lists.open-zfs.org/mailman/listinfo/developer
