Dan Price wrote:
> On Thu 14 Aug 2008 at 03:37PM, Evan Layton wrote:
>> This error is coming from ZFS. Did you change out one of your disks in
>> the mirror recently? If so you may want to run format on that disk and
>> see if it has an EFI label on it. If it does you'll have to break the
>> mirror and remove that disk from the mirror, re-label it and add it
>> back into the mirror.
> 
> Evan, I would not recommend this procedure.  Doing so will likely
> (although probably not surely) result in an unbootable system.

I guess I just got lucky when this worked for me. This in a nutshell is what 
Lori had indicated I should try but I can see based on what you written here 
that either I misunderstood her or missed something.

Thanks for the added information and letting us know that what I had suggested 
was a problem!

-evan

> 
> Yesterday I saw that I had an EFI labelled disk in my root pool,
> by accident.  And so set out to fix the issue.  I did what you'd expect:
> detach the device, re-fdisk it, then repartition it with format -e, and
> with an SMI label.
> 
> The end result of my fiddling was a machine which would not boot
> build 95.  As I tried various remedies (like installgrub, boot
> to the cd and massage the pool, etc), the problem got worse until the
> system could not boot any of my BEs anymore.
> 
> Today I was lucky enough to have Lin, George and Erik from the ZFS team
> all in my office helping me to debug this.  They were awesome and we
> quickly got to a root cause.
> 
> The heart of the problem is that /etc/zfs/zpool.cache in the boot
> archive and the pool configuration stored in the disks themselves can
> get out of sync with each other.  That's bad, because when ZFS tries to
> reconcile them at boot time, it will get upset and panic, thinking that
> the pool is damaged.  This can happen when you do a mirror attach or
> detach because apparently disk GUIDs in the pool can change as the
> pool topology changes and mirror vdevs come and go.   We stepped
> through the problem with KMDB and watched ZFS load up a healthy pool,
> then shoot it down as broken due to this reconciliation problem.
> 
> If you want to remove an EFI labelled disk from your root pool, my advice
> to you would be to do the following.  Note that I have not tested this
> particular sequence, but I think it will work.  Hah.
> 
> 0) Backup your data and settings.
> 
> 1) 'zpool detach' the EFI labelled disk from your pool.  After you do this
>    YOU MUST NOT REBOOT.  Your system is now in a fragile state.
> 
> 2) Run 'zpool status' to ensure that your pool now has one disk.
> 
> 3) Edit /etc/boot/solaris/filelist.ramdisk.  Remove the only line in the
>    file:
> 
>       etc/zfs/zpool.cache
> 
> 4) Delete /platform/i86pc/boot_archive and /platform/i86pc/amd64/boot_archive
> 
> 5) Run 'bootadm update-archive' -- This rebuilds the boot archive,
>    omitting the zpool.cache file.
> 
> It may also be necessary to do installgrub at this point.  Probably, and
> it wouldn't hurt.
> 
> 6) Reboot your system, to ensure that you have a working configuration.
> 
> In Nevada, this is not an issue (George told me) because the boot archive
> omits the zpool.cache file, so there's never any state to get out of sync.
> I was left wondering why we populate /etc/boot/solaris/filelist.ramdisk
> with "etc/zfs/zpool.cache".  At a minimum, if we haven't already, we
> should stop doing that as soon as possible.
> 
> I will be filing bugs to cover these issues tomorrow.
> 
>         -dp
> 

_______________________________________________
indiana-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/indiana-discuss

Reply via email to