Hi Josh,

My gut says that you don't want to change label / GUID's. I can't imagine that things would be good if you trick it into life with a GUID change. Whilst you won't see silent data corruption, I could absolutely see every block having a checksum issue...

Over time, it's become apparent to me that ZFS's dealing with device path changes is imperfect. I haven't yet worked out why, but I have been caught out by it a few times. Especially across versions of Solaris...

One thing I'd suggest is that you physically disconnect the desk that's reporting busted and try the import again to see what happens. I have found that if I get the system to the closest it can be to something that 'should' work, ZFS / zpool operations are much more likely to succeed.

Also - I can't imagine a situation in which a controller failure would cause a new GUID to be written to a disk. What else have you done to the pool trying to recover it?

Might also be worth taking a poke at the devlinks themselves. ls -l /dev/rdsk and make sure c7t7 and c7t5 actually point to something that looks like it.

Also - the entry for c9tx looks weird, and you didn't include it in your output of zdb's. Is it real?

Hm. Very weird.

Nathan.


On 04/13/14 10:53 PM, Joshua Edmonds wrote:
Hey guys,

I've managed to end up with a corrupted zpool that I can no longer import on Solaris 11.1. Luckily I do have a backup of the most important stuff, but I'd still like to recover the remainder if possible.

The pool was a 4-disk raidz1 with a ZIL (unmirrored).

From what I can tell, it looks like one of the devices caused the SAS HBA (LSI1068E) to reset and offline the pool. Upon reboot the pool wouldn't import and "zpool status" was showing that all disks except 1 were unavailable. However, I think this was a result of the HBA reset that caused solaris to re-enumerate the devices. At this point I exported the pool and attempted to reimport - I probably shouldn't have because it has now led to a condition where two disks have the same GUID.


root@solaris:~# zdb -l /dev/dsk/c7t5d0s0 |egrep "(children|guid)"
    pool_guid: 7650914121155923652
    top_guid: 4244192714700669945
    guid: 2113359054019808692
    vdev_children: 2
        guid: 4244192714700669945
        children[0]:
            guid: 16334042155336894037
        children[1]:
            guid: 2113359054019808692
        children[2]:
            guid: 11196011208380299867
        children[3]:
            guid: 15149956586209127431

root@solaris:~# zdb -l /dev/dsk/c7t6d0s0 |egrep "(children|guid)"
    pool_guid: 7650914121155923652
    top_guid: 4244192714700669945
    guid: 11196011208380299867
    vdev_children: 2
        guid: 4244192714700669945
        children[0]:
            guid: 16334042155336894037
        children[1]:
            guid: 2113359054019808692
        children[2]:
            guid: 11196011208380299867
        children[3]:
            guid: 15149956586209127431

root@solaris:~# zdb -l /dev/dsk/c7t7d0s0 |egrep "(children|guid)"
    pool_guid: 7650914121155923652
    top_guid: 4244192714700669945
    guid: 2113359054019808692
    vdev_children: 2
        guid: 4244192714700669945
        children[0]:
            guid: 16334042155336894037
        children[1]:
            guid: 2113359054019808692
        children[2]:
            guid: 11196011208380299867
        children[3]:
            guid: 15149956586209127431

root@solaris:~# zdb -l /dev/dsk/c7t8d0s0 |egrep "(children|guid)"
    pool_guid: 7650914121155923652
    top_guid: 4244192714700669945
    guid: 15149956586209127431
    vdev_children: 2
        guid: 4244192714700669945
        children[0]:
            guid: 16334042155336894037
        children[1]:
            guid: 2113359054019808692
        children[2]:
            guid: 11196011208380299867
        children[3]:
            guid: 15149956586209127431


From the above, c7t5d0 and c7t7d0 both have a guid of 2113359054019808692. Looks like one of these should be 16334042155336894037 instead.

zpool import gives me the following output (I'm attempting to import this on another system which is why there's a reference to c9t4d0, aka guid: 16334042155336894037).

root@solaris:~# zpool import
  pool: bucket
    id: 7650914121155923652
 state: UNAVAIL
status: The pool was last accessed by another system.
action: The pool cannot be imported due to unavailable devices or data.
   see: http://support.oracle.com/msg/ZFS-8000-EY
config:

        bucket       UNAVAIL  insufficient replicas
          raidz1-0   DEGRADED
            c9t4d0   UNAVAIL  corrupted data
            c7t7d0   ONLINE
            c7t6d0   ONLINE
            c7t8d0   ONLINE

device details:

        c9t4d0     UNAVAIL        corrupted data
        status: ZFS detected errors on this device.
                The device has bad label or disk contents.

Being raidz1, I thought I would be able to import the pool with 1 device missing, but no matter what I try it simply won't import due to "insufficient replicas"

Is anyone aware of a method to modify the on-disk metadata and change the device guid?

Any help is greatly appreciated!

Cheers,
Josh


_______________________________________________
msosug mailing list
[email protected]
http://mexico.purplecow.org/m/listinfo/msosug


_______________________________________________
msosug mailing list
[email protected]
http://mexico.purplecow.org/m/listinfo/msosug

Reply via email to