Dave, this definitely sounds like a bug. Pavel (cc'd) has done some work in this area that we unfortunately haven't upstreamed yet, but it would fix your issue. In the new method, we always trust the MOS config as the source of truth w.r.t. the vdev tree layout. So (I think) with the new code it would actually import correctly.
We're working on upstreaming those changes to illumos, but it's taking a bit longer than anticipated. --matt On Mon, Aug 28, 2017 at 10:46 AM, Dave Baukus <[email protected]> wrote: > I have a contrived scenario which generated the follow vdev trees: > > Generated from the labels: > rvd->vdev_children = 1 > rvd->vdev_state = VDEV_STATE_DEGRADED > rvd->vdev_child[0]->vdev_children = 5 > rvd->vdev_child[0]->vdev_state = VDEV_STATE_DEGRADED > rvd->vdev_child[0]->vdev_child[0]->vdev_children = 2 > rvd->vdev_child[0]->vdev_child[0]->vdev_state = VDEV_STATE_CANT_OPEN > rvd->vdev_child[0]->vdev_child[0]->vdev_child[0]->vdev_children = 0 > rvd->vdev_child[0]->vdev_child[0]->vdev_child[0]->vdev_state = > VDEV_STATE_HEALTHY > rvd->vdev_child[0]->vdev_child[0]->vdev_child[1]->vdev_children = 0 > rvd->vdev_child[0]->vdev_child[0]->vdev_child[1]->vdev_state = > VDEV_STATE_CANT_OPEN > rvd->vdev_child[0]->vdev_child[1]->vdev_children = 0 > rvd->vdev_child[0]->vdev_child[1]->vdev_state = VDEV_STATE_HEALTHY > rvd->vdev_child[0]->vdev_child[2]->vdev_children = 0 > rvd->vdev_child[0]->vdev_child[2]->vdev_state = VDEV_STATE_HEALTHY > rvd->vdev_child[0]->vdev_child[3]->vdev_children = 0 > rvd->vdev_child[0]->vdev_child[3]->vdev_state = VDEV_STATE_HEALTHY > rvd->vdev_child[0]->vdev_child[4]->vdev_children = 0 > rvd->vdev_child[0]->vdev_child[4]->vdev_state = VDEV_STATE_HEALTHY > > > Generated from the MOS: > mrvd->vdev_children = 1 > mrvd->vdev_state = VDEV_STATE_CLOSED > mrvd->vdev_child[0]->vdev_children = 5 > mrvd->vdev_child[0]->vdev_state = VDEV_STATE_CLOSED > mrvd->vdev_child[0]->vdev_child[0]->vdev_children = 0 > mrvd->vdev_child[0]->vdev_child[0]->vdev_state = VDEV_STATE_CLOSED > mrvd->vdev_child[0]->vdev_child[1]->vdev_children = 0 > mrvd->vdev_child[0]->vdev_child[1]->vdev_state = VDEV_STATE_CLOSED > mrvd->vdev_child[0]->vdev_child[2]->vdev_children = 0 > mrvd->vdev_child[0]->vdev_child[2]->vdev_state = VDEV_STATE_CLOSED > mrvd->vdev_child[0]->vdev_child[3]->vdev_children = 0 > mrvd->vdev_child[0]->vdev_child[3]->vdev_state = VDEV_STATE_CLOSED > mrvd->vdev_child[0]->vdev_child[4]->vdev_children = 0 > mrvd->vdev_child[0]->vdev_child[4]->vdev_state = VDEV_STATE_CLOSED > > > When this pool is imported, the discrepancy between these two trees causes > panic in spa_config_valid_zaps(). > My question is: shouldn't / couldn't spa_config_valid_zaps(), fail the > import ? > > I believe that this scenario was created as follows: > > - Create a 5 disk, raidz1 pool on boxA. > - Move the 5 disks to boxB and import the pool > - Something went wrong with one of the five disks boxB, and our > sparing code replaced the bad disk. > - The 5 original diks were then moved back to boxA (possibly before > the resilver was complete). > > ZDB shows that only 1 disk of the 5 has a set of labels with the extra > children > > -- > Dave Baukus > > *openzfs-developer* | Archives > <https://openzfs.topicbox.com/groups/developer/discussions/Tbbf302c79ceef50e-M2a00db269787fe1c1478ecc8> > | Powered by Topicbox <https://topicbox.com> > > ------------------------------------------ openzfs-developer Archives: https://openzfs.topicbox.com/groups/developer/discussions/Tbbf302c79ceef50e-M70582e19ae7bb7b277e69895 Powered by Topicbox: https://topicbox.com
