Hi Josh,
My gut says that you don't want to change label / GUID's. I can't
imagine that things would be good if you trick it into life with a GUID
change. Whilst you won't see silent data corruption, I could absolutely
see every block having a checksum issue...
Over time, it's become apparent to me that ZFS's dealing with device
path changes is imperfect. I haven't yet worked out why, but I have been
caught out by it a few times. Especially across versions of Solaris...
One thing I'd suggest is that you physically disconnect the desk that's
reporting busted and try the import again to see what happens. I have
found that if I get the system to the closest it can be to something
that 'should' work, ZFS / zpool operations are much more likely to succeed.
Also - I can't imagine a situation in which a controller failure would
cause a new GUID to be written to a disk. What else have you done to the
pool trying to recover it?
Might also be worth taking a poke at the devlinks themselves. ls -l
/dev/rdsk and make sure c7t7 and c7t5 actually point to something that
looks like it.
Also - the entry for c9tx looks weird, and you didn't include it in your
output of zdb's. Is it real?
Hm. Very weird.
Nathan.
On 04/13/14 10:53 PM, Joshua Edmonds wrote:
Hey guys,
I've managed to end up with a corrupted zpool that I can no longer
import on Solaris 11.1. Luckily I do have a backup of the most
important stuff, but I'd still like to recover the remainder if possible.
The pool was a 4-disk raidz1 with a ZIL (unmirrored).
From what I can tell, it looks like one of the devices caused the SAS
HBA (LSI1068E) to reset and offline the pool. Upon reboot the pool
wouldn't import and "zpool status" was showing that all disks except 1
were unavailable. However, I think this was a result of the HBA reset
that caused solaris to re-enumerate the devices. At this point I
exported the pool and attempted to reimport - I probably shouldn't
have because it has now led to a condition where two disks have the
same GUID.
root@solaris:~# zdb -l /dev/dsk/c7t5d0s0 |egrep "(children|guid)"
pool_guid: 7650914121155923652
top_guid: 4244192714700669945
guid: 2113359054019808692
vdev_children: 2
guid: 4244192714700669945
children[0]:
guid: 16334042155336894037
children[1]:
guid: 2113359054019808692
children[2]:
guid: 11196011208380299867
children[3]:
guid: 15149956586209127431
root@solaris:~# zdb -l /dev/dsk/c7t6d0s0 |egrep "(children|guid)"
pool_guid: 7650914121155923652
top_guid: 4244192714700669945
guid: 11196011208380299867
vdev_children: 2
guid: 4244192714700669945
children[0]:
guid: 16334042155336894037
children[1]:
guid: 2113359054019808692
children[2]:
guid: 11196011208380299867
children[3]:
guid: 15149956586209127431
root@solaris:~# zdb -l /dev/dsk/c7t7d0s0 |egrep "(children|guid)"
pool_guid: 7650914121155923652
top_guid: 4244192714700669945
guid: 2113359054019808692
vdev_children: 2
guid: 4244192714700669945
children[0]:
guid: 16334042155336894037
children[1]:
guid: 2113359054019808692
children[2]:
guid: 11196011208380299867
children[3]:
guid: 15149956586209127431
root@solaris:~# zdb -l /dev/dsk/c7t8d0s0 |egrep "(children|guid)"
pool_guid: 7650914121155923652
top_guid: 4244192714700669945
guid: 15149956586209127431
vdev_children: 2
guid: 4244192714700669945
children[0]:
guid: 16334042155336894037
children[1]:
guid: 2113359054019808692
children[2]:
guid: 11196011208380299867
children[3]:
guid: 15149956586209127431
From the above, c7t5d0 and c7t7d0 both have a guid of
2113359054019808692. Looks like one of these should be
16334042155336894037 instead.
zpool import gives me the following output (I'm attempting to import
this on another system which is why there's a reference to c9t4d0, aka
guid: 16334042155336894037).
root@solaris:~# zpool import
pool: bucket
id: 7650914121155923652
state: UNAVAIL
status: The pool was last accessed by another system.
action: The pool cannot be imported due to unavailable devices or data.
see: http://support.oracle.com/msg/ZFS-8000-EY
config:
bucket UNAVAIL insufficient replicas
raidz1-0 DEGRADED
c9t4d0 UNAVAIL corrupted data
c7t7d0 ONLINE
c7t6d0 ONLINE
c7t8d0 ONLINE
device details:
c9t4d0 UNAVAIL corrupted data
status: ZFS detected errors on this device.
The device has bad label or disk contents.
Being raidz1, I thought I would be able to import the pool with 1
device missing, but no matter what I try it simply won't import due to
"insufficient replicas"
Is anyone aware of a method to modify the on-disk metadata and change
the device guid?
Any help is greatly appreciated!
Cheers,
Josh
_______________________________________________
msosug mailing list
[email protected]
http://mexico.purplecow.org/m/listinfo/msosug
_______________________________________________
msosug mailing list
[email protected]
http://mexico.purplecow.org/m/listinfo/msosug