Jan and George: I had also seen this on ultra24 and ultra40 when we test the 1.5 TB Seagate sata disk. So far I had only seen this on system with sata drive.
jan damborsky wrote: > George, > > > George Wilson wrote: >> Jan, >> >> It seems like the problem is not with ZFS but with the device driver. >> If the driver is failing to provide the devid then ZFS is just going >> to be a victim. > > I agree with you that this is what we might be encountering > with respect to 'devid' problem here. > > >> I would recommend that we change the synopsis to devid_get() fails >> with "Invalid argument" and pass this to the driver folks. > > I will let Sanjay comment on this, since he has done > some more investigation recently. > >> Do you know if it's always the same driver? > > I can only reproduce it on one system - this one has SATA drive > connected to the controller handled by nv_sata(7D) driver. I think > that Sanjay encountered that problem also on system with SATA disk. > > Thank you, > Jan > >> Thanks, >> George >> >> jan damborsky wrote: >>> Hi George, >>> >>> >>> George Wilson wrote: >>>> Jan, >>>> >>>> So who is working the UFS issue and how is that being tracked. >>> In general, bugs in OpenSolaris Caiman installer are tracked in >>> Bugzilla at >>> defect.opensolaris.org. This is the preferred over filing bugs in >>> Bugster. >>> Speaking about this particular problem, it is tracked by following bug: >>> >>> 4675 Fix for bug 30 causes ZFS label to be mangled - ending up in >>> GRUB prompt after installing OpenSolaris >>> http://defect.opensolaris.org/bz/show_bug.cgi?id=4675 >>> >>> Sanjay Nadkarni is assigned to this bug (CCing him). >>> >>>> I would recommend that we keep this bug as the UFS/install issue and >>>> create a new bug and send that to me. >>> As pointed above, Bugzilla is preferred database to track issues in >>> Caiman installer. >>> >>> Please note that 6769487 was originally filed for tracking the >>> problem when >>> GRUB can't access ZFS filesystem because 'devid' is not present in >>> ZFS label. >>> >>> It was overloaded later by 'UFS' problem. >>> >>>> Can you move the descriptions below from this bug and add them to >>>> the new one? >>> To be honest, since installer part of problem related to UFS is >>> tracked by 4675, >>> I don't see why we shouldn't continue to use 6769487 to track the >>> issue this bug >>> was initially filed for and I think that we might lose some context when >>> ZFS related information is moved from 6769487 to the new bug. >>> That said, if you think it might be helpful, please let me know and >>> I will try to capture all information from 6769487 I think is >>> relevant to >>> the ZFS part in new bug. >>> >>>> Also since you can reproduce this can you tell me exactly how or >>>> point me at a system which I can login into to debug? >>> Sure, the machine can be accessed via 'ssh', but since it is not >>> directly accessible from SWAN (it is behind the NAT), >>> I will provide you with instructions, how to access it. >>> Unfortunately it doesn't have console access. >>> >>> Please let me know, in which state you would need to have that >>> machine - right after the installation finished, but before reboot ? >>> >>> Unfortunately, following the procedure itself doesn't seem to be >>> sufficient for reproducing the problem :-( I tried exactly the >>> same steps on other bare metal as well as in virtual environment, >>> but without success. >>> >>> >>>> I want to make sure we don't lose sight of the UFS issue and this >>>> bug has already gone down to root cause so let's not overload this >>>> bug any further. >>> UFS part of problem is being solved right now (please feel free to >>> monitor >>> bug 4675 for progress and add anything you might consider relevant >>> to that issue). >>> >>> Thank you, >>> Jan >>> >>>> Thanks, >>>> George >>>> >>>> jan damborsky wrote: >>>>> Hi George, >>>>> >>>>> there are at least two parts of this problem: >>>>> >>>>> [1] UFS one >>>>> This is what you are referring to and it is being tracked by >>>>> Bugzilla bug 4675. >>>>> In that case workaround #2 helps to "solve" the problem. >>>>> >>>>> [2] ZFS one >>>>> Please see original description #1. I am able to reproduce that on >>>>> system >>>>> at will which didn't contain any UFS filesystem and thus [1] is not >>>>> applicable here. 'zpool import' helps in this case. >>>>> >>>>> Also please see: >>>>> * description #4 >>>>> * description #5 >>>>> * public comments #8 >>>>> * comments #6 >>>>> >>>>> People are apparently encountering this problem in >>>>> other configurations (e.g. when using virgin disk >>>>> or installing on system containing only Windows). >>>>> >>>>> I am not stating that this is in fact problem in ZFS as it might >>>>> be related for example to device driver code, but at this point it >>>>> seems to me that ZFS team is the most eligible one to move >>>>> things forward, as GRUB can't read menu.lst from ZFS >>>>> filesystem . >>>>> >>>>> Please let me know if you have any questions or need more >>>>> information. >>>>> >>>>> Thank you, >>>>> Jan >>>>> >>>>> >>>>> George Wilson wrote: >>>>>> Jan, >>>>>> >>>>>> I don't understand how this is a ZFS problem. I thought from the >>>>>> evaluation that the issue is that UFS and ZFS are sharing the same >>>>>> block and this was being caused by the fact the the livecd had >>>>>> mounted a UFS filesystem as part of the installation. Could you >>>>>> clarify? >>>>>> >>>>>> Thanks, >>>>>> George >>>>>> >>>>>> Jan.Damborsky at Sun.COM wrote: >>>>>>> Sun Confidential: Internal only >>>>>>> >>>>>>> *Synopsis*: Ended up in 'grub>' prompt after installation of >>>>>>> OpenSolaris 2008.11 (build 101a) >>>>>>> >>>>>>> CrPrint: http://bt2ws.central.sun.com/CrPrint?id=6769487 >>>>>>> Monaco: http://monaco.sfbay.sun.com/detail.jsf?cr=6769487 >>>>>>> >>>>>>> Due to a change of Responsible manager requested by >>>>>>> jan.damborsky at sun.com, >>>>>>> david.brittle at sun.com is now the responsible manager for: >>>>>>> >>>>>>> Due to a change requested by jan.damborsky at sun.com, >>>>>>> this CR is being redispatched: >>>>>>> >>>>>>> This is a high priority CR and requires your immediate attention. >>>>>>> Please evaluate it as soon as possible. Thank you. >>>>>>> >>>>>>> CR 6769487 changed on Nov 12 2008 by jan.damborsky at sun.com >>>>>>> >>>>>>> === Field ============ === New Value ============= === Old Value >>>>>>> ============= >>>>>>> >>>>>>> Category kernel >>>>>>> opensolaris Comments New >>>>>>> Note >>>>>>> Comments New Note Old >>>>>>> Note Comments New >>>>>>> Note Old Note Public >>>>>>> Comments New >>>>>>> Note Responsible >>>>>>> Manager david.brittle at sun.com eric.ray at sun.com >>>>>>> Status 1-Dispatched 5-Cause >>>>>>> Known SubCategory >>>>>>> zfs livecd >>>>>>> ====================== =========================== >>>>>>> =========================== >>>>>>> >>>>>>> *Change Request ID*: 6769487 >>>>>>> >>>>>>> *Synopsis*: Ended up in 'grub>' prompt after installation of >>>>>>> OpenSolaris 2008.11 (build 101a) >>>>>>> >>>>>>> Product: solaris >>>>>>> Category: kernel >>>>>>> Subcategory: zfs >>>>>>> Type: Defect >>>>>>> Subtype: Functionality >>>>>>> Status: 1-Dispatched >>>>>>> Substatus: Priority: 1-Very High >>>>>>> Introduced In Release: Introduced In Build: Responsible >>>>>>> Manager: david.brittle at sun.com >>>>>>> Responsible Engineer: Initial Evaluator: zfs-team at sun.com >>>>>>> Keywords: >>>>>>> === *Description* >>>>>>> ============================================================ >>>>>>> When testing installation with recent OpenSolaris builds, we have >>>>>>> been encountering that >>>>>>> in some cases, people end up in GRUB prompt after the >>>>>>> installation - it seems that menu.lst >>>>>>> can't be accessed for some reason. For now bunch of Bugzilla bugs >>>>>>> seem to be describing >>>>>>> the same manifestation of the problem which root cause has not >>>>>>> been identified yet: >>>>>>> >>>>>>> 4051 opensolaris b99b/b100a does not install on 1.5 TB disk or >>>>>>> boot fails after install >>>>>>> 4591 Install failure on a Sun Fire X4240 with Opensolaris 200811 >>>>>>> 4161 no grub in 2008.11 Development Builds (comment #20, comment >>>>>>> #31) >>>>>>> 4760 Enter grub after installing 2008.11 RC 1 >>>>>>> ... >>>>>>> >>>>>>> I also hit that problem when testing Automated Installer (it is a >>>>>>> part of Caiman project >>>>>>> and will replace current jumpstart install technology), I was >>>>>>> able to make GRUB find >>>>>>> 'menu.lst' just by using 'zpool import' command - please see >>>>>>> below for detailed procedure. >>>>>>> >>>>>>> >>>>>>> configuration: >>>>>>> -------------- >>>>>>> HW: Ultra 20, 1GB RWM, 1 250GB SATA drive >>>>>>> SW: Opensolaris build 100, 64bit mode >>>>>>> >>>>>>> steps used: >>>>>>> ----------- >>>>>>> [1] OpenSolaris 100 installed using Automated Installer >>>>>>> - Solaris 2 partition created during installation >>>>>>> >>>>>>> * partition configuration before installation: >>>>>>> >>>>>>> # fdisk -W - c2t0d0p0 >>>>>>> ...* Id Act Bhead Bsect Bcyl Ehead Esect Ecyl >>>>>>> Rsect Numsect >>>>>>> 192 0 0 1 1 254 63 1023 >>>>>>> 16065 22491000 >>>>>>> * partition configuration after installation: >>>>>>> >>>>>>> # fdisk -W - c2t0d0p0 >>>>>>> ...* Id Act Bhead Bsect Bcyl Ehead Esect Ecyl >>>>>>> Rsect Numsect >>>>>>> 192 0 0 1 1 254 63 1023 >>>>>>> 16065 22491000 191 128 254 63 1023 254 >>>>>>> 63 1023 22507065 30000000 >>>>>>> >>>>>>> [2] When I reboot the system after the installation, I ended up >>>>>>> in GRUB prompt: >>>>>>> grub> root >>>>>>> (hd0,1,a): Filesystem type unknown, partition type 0xbf >>>>>>> >>>>>>> grub> cat /rpool/boot/grub/menu.lst >>>>>>> >>>>>>> Error 17: Cannot mount selected partition >>>>>>> >>>>>>> grub> >>>>>>> >>>>>>> [3] I rebooted into AI and did 'zpool import' >>>>>>> # zdb -l /dev/rdsk/c2t0d0s0 > /tmp/zdb_before_import.txt (attached) >>>>>>> # zpool import -f rpool >>>>>>> # zdb -l /dev/rdsk/c2t0d0s0 > /tmp/zdb_after_import.txt (attached) >>>>>>> # diff /tmp/zdb_before_import.txt /tmp/zdb_after_import.txt >>>>>>> 7c7 >>>>>>> < txg=21 >>>>>>> --- >>>>>>> >>>>>>>> txg=2675 >>>>>>>> >>>>>>> 9c9 >>>>>>> < hostid=4741222 >>>>>>> --- >>>>>>> >>>>>>>> hostid=4247690 >>>>>>>> >>>>>>> 17a18 >>>>>>> >>>>>>>> devid='id1,sd at f00c778e247ac7bd0000238460000/a' >>>>>>>> >>>>>>> 31c32 >>>>>>> ... >>>>>>> # reboot >>>>>>> >>>>>>> [4] Now GRUB can access menu.lst and Solaris is booted >>>>>>> >>>>>>> hypothesis >>>>>>> ---------- >>>>>>> It seems that for some reason, when ZFS pool was created, 'devid' >>>>>>> information was not added to the ZFS label. >>>>>>> >>>>>>> When 'zpool import' was called, 'devid' got populated. >>>>>>> >>>>>>> Looking at the GRUB ZFS plug-in, it seems that 'devid' >>>>>>> (ZPOOL_CONFIG_DEVID attribute) is >>>>>>> required in order to be able to access ZFS filesystem: >>>>>>> >>>>>>> In grub/grub-0.95/stage2/fsys_zfs.c: >>>>>>> >>>>>>> vdev_get_bootpath() >>>>>>> { >>>>>>> ... >>>>>>> if (strcmp(type, VDEV_TYPE_DISK) == 0) { >>>>>>> if (vdev_validate(nv) != 0 || >>>>>>> (nvlist_lookup_value(nv, ZPOOL_CONFIG_PHYS_PATH, >>>>>>> bootpath, DATA_TYPE_STRING, NULL) != 0) || >>>>>>> (nvlist_lookup_value(nv, ZPOOL_CONFIG_DEVID, >>>>>>> devid, DATA_TYPE_STRING, NULL) != 0)) >>>>>>> return (ERR_NO_BOOTPATH); >>>>>>> ... >>>>>>> } >>>>>>> >>>>>>> additional observations: >>>>>>> ------------------------ >>>>>>> [1] If 'devid' is populated during installation after 'zpool create' >>>>>>> operation, the problem doesn't occur. >>>>>>> >>>>>>> [2] If following described procedure, the problem is reproducible >>>>>>> at will on system where it was initially reproduced (please see >>>>>>> above for the configuration) >>>>>>> >>>>>>> [3] Other people reported this problem also for following >>>>>>> configurations: >>>>>>> * vmware >>>>>>> * Sun Java Workstation W2100z with 2xOpteron2.4G 3G Mem >>>>>>> >>>>>>> [4] When installation into existing Solaris2 partition containing >>>>>>> Solaris instance is done >>>>>>> 'devid' is always populated and the problem doesn't occur (it >>>>>>> doesn't matter if partition >>>>>>> is marked 'active' or not). >>>>>>> >>>>>>> *** (#1 of 5): 2008-11-10 10:27:21 GMT+00:00 jan.damborsky at sun.com >>>>>>> >>>>>>> If the system once be Navada, (101a as mine), install OpenSolaris >>>>>>> will hit this issue, while keep the partition but not choose the >>>>>>> entire disk (I suspect this caused the issue, perhaps) >>>>>>> There's a diagnostic partition on there if Navada installed, and >>>>>>> opensolaris 2008.11 simply enter grub> as this CR mentioned. Then >>>>>>> I use the entire disk, this time the system boot up okay. >>>>>>> But while I re-install it again with a smaller size than the >>>>>>> entire disk specified, >>>>>>> grub has no problem, but GNOME cannot start (hang there endlessly) >>>>>>> >>>>>>> *** (#2 of 5): 2008-11-10 10:45:29 GMT+00:00 robin.guo at sun.com >>>>>>> >>>>>>> The root cause of this problem is the continued existence of UFS >>>>>>> filesystems structures on disk, even after the zfs filesystem is >>>>>>> created and is live. Because ZFS did not destroy the UFS magic, >>>>>>> both GRUB and Solaris think there's a (horribly damaged) UFS >>>>>>> filesystem present on that slice (a WARNING is displayed at boot >>>>>>> time during OpenSolaris boot informing the user that >>>>>>> /mnt/solaris<N> (where <N> is a number) could not be mounted >>>>>>> because of filesystem problems -- in reality, that slice is where >>>>>>> the zfs root is located. >>>>>>> >>>>>>> In GRUB, since code that attempts to mount root does so by trying >>>>>>> each filesystem module in the order in which they are listed in >>>>>>> the fsys_table[] array, and since UFS is listed before ZFS, GRUB >>>>>>> thinks that a UFS filesystem exists in the slice actually >>>>>>> containing the ZFS root filesystem (and fails trying to mount it, >>>>>>> leaving it unable to locate the real root filesystem). A >>>>>>> modified version of GRUB that modifies fsys_table by declaring >>>>>>> the ZFS operations before the UFS operations confirms this >>>>>>> hypothesis. >>>>>>> >>>>>>> Therefore, a valid workaround destroys the UFS magic, preventing >>>>>>> both GRUB's and Solaris's UFS modules from recognizing the slice >>>>>>> as a UFS filesystem. When GRUB's UFS code fails to find a valid >>>>>>> UFS filesystem, the ZFS module is subsequently tried and is able >>>>>>> to successfully mount the filesystem. >>>>>>> >>>>>>> *** (#3 of 5): 2008-11-11 03:23:04 GMT+00:00 seth.goldberg at sun.com >>>>>>> *** Last Edit: 2008-11-11 03:45:05 GMT+00:00 seth.goldberg at sun.com >>>>>>> >>>>>>> I think there are two separate issues here. The UFS label >>>>>>> appears to be one. The signature for this bug is that at grub >>>>>>> prompt, typing root - generates the UFS filesystem info. >>>>>>> However there is a secondary bug where after installation, one >>>>>>> gets a grub prompt. Typing root command at the grub prompmt >>>>>>> generates - unknown file system. In this case no UFS filesystems >>>>>>> were detected or mounted. The workaround for this has been to >>>>>>> run zpool import. This still needs to be investigated. >>>>>>> >>>>>>> *** (#4 of 5): 2008-11-12 00:04:16 GMT+00:00 sanjay.nadkarni at sun.com >>>>>>> >>>>>>> We were able to recreate the grub failure where typing root at >>>>>>> the prompt returns unknown file system. This was on a Fujistu >>>>>>> LifeBook S7211. It was installed with installed with Vista. We >>>>>>> then booted OpenSolaris and started the install. At the end of >>>>>>> the installation we noted that the zfs label did not have devid >>>>>>> information. >>>>>>> >>>>>>> We then loaded a simple program that would get the devid >>>>>>> (devid_get). This failed with "Invalid argument". We then >>>>>>> rebooted the liveCD again and reran this program and this time it >>>>>>> printed out the device id. The disk is off a SATA controller. >>>>>>> The driver that attached to this is ahci. The device is: >>>>>>> 82801HBM/HEM. The disk is Fujitsu MHY2120BH >>>>>>> >>>>>>> *** (#5 of 5): 2008-11-12 02:43:18 GMT+00:00 sanjay.nadkarni at sun.com >>>>>>> >>>>>>> >>>>>>> === *Public Comments* >>>>>>> ======================================================== >>>>>>> Following Bugzilla bugs were closed as duplicate of this issue: >>>>>>> >>>>>>> 4772 Cannot install OpenSolaris 2008.11 on VMware Server 2.0 >>>>>>> http://defect.opensolaris.org/bz/show_bug.cgi?id=4772 >>>>>>> >>>>>>> 4756 after reboot when finishing the installation, system can not >>>>>>> boot >>>>>>> http://defect.opensolaris.org/bz/show_bug.cgi?id=4756 >>>>>>> >>>>>>> 4749 After installed opensolaris0811RC1 on Dell PowerEdge, can't >>>>>>> boot from disk. >>>>>>> http://defect.opensolaris.org/bz/show_bug.cgi?id=4749 >>>>>>> >>>>>>> *** (#1 of 9): 2008-11-10 17:20:54 GMT+00:00 dave.miner at sun.com >>>>>>> *** Last Edit: 2008-11-11 11:45:41 GMT+00:00 jan.damborsky at sun.com >>>>>>> >>>>>>> zpool import doesn't help for me, nor would I expect it to (it's >>>>>>> a mystery >>>>>>> why it seems to). Clearing the UFS magic helps. >>>>>>> >>>>>>> Looking further, I find that the data on disk at 8k seems to still >>>>>>> be a UFS superblock, not a zfs vdev_boot_header_t, which doesn't >>>>>>> make >>>>>>> sense to me; in any ZFS initialization scheme, one would expect >>>>>>> all parts >>>>>>> of the label to be completely written. >>>>>>> >>>>>>> The expected vdev_boot_header_t appears at the label copy at >>>>>>> 256K+8K, as >>>>>>> expected. >>>>>>> >>>>>>> *** (#2 of 9): 2008-11-11 04:39:09 GMT+00:00 dan.mick at sun.com >>>>>>> >>>>>>> It appears that ZFS doesn't validate that first 8k (the >>>>>>> vdev_boot_header), so >>>>>>> that explains why the kernel was happy even with a UFS superblock >>>>>>> where the >>>>>>> vdev_boot_header was supposed to be. >>>>>>> >>>>>>> Also, the last few bits of the 8k block in question seem to >>>>>>> contain a >>>>>>> zio_block_tail_t (i.e. a zbt_magic and a zbt_cksum), so it seems >>>>>>> this block >>>>>>> was written by ZFS sometime in the past. >>>>>>> Possible theories: 1) the ZFS initialization somehow skipped >>>>>>> this 8k header, >>>>>>> or 2) somehow the 8k superblock was rewritten over the block >>>>>>> after ZFS initialized it. >>>>>>> >>>>>>> *** (#3 of 9): 2008-11-11 04:49:57 GMT+00:00 dan.mick at sun.com >>>>>>> >>>>>>> Another possible theory: could this be the superblock flush from >>>>>>> a still-mounted UFS being shut down? >>>>>>> >>>>>>> (The block was correct until after the OpenSolaris installer said >>>>>>> it was done, >>>>>>> and waited for me to press a button to reboot. I suspect >>>>>>> the original UFS was mounted and not unmounted before the ZFS >>>>>>> creation, >>>>>>> so they both think they own the device.) >>>>>>> >>>>>>> Supporting evidence: the "last mounted" path in the superblock is >>>>>>> "/mnt/solaris0". >>>>>>> >>>>>>> I suspect the cause of this bug is a UFS that's mounted and >>>>>>> should be >>>>>>> unmounted by the installer before ZFS creation. >>>>>>> >>>>>>> What's the right category/subcategory for Caiman? >>>>>>> >>>>>>> *** (#4 of 9): 2008-11-11 07:34:58 GMT+00:00 dan.mick at sun.com >>>>>>> >>>>>>> The live CD has historically automatically mounted up any UFS >>>>>>> file systems that it found, going back to Belenix. Interesting >>>>>>> that this is just now a problem, but it probably is a result of >>>>>>> switching to ZFS for swap, as up until build 96 we always created >>>>>>> a swap slice at the start of the disk, which it appears would >>>>>>> have masked this problem. >>>>>>> >>>>>>> *** (#5 of 9): 2008-11-11 15:02:03 GMT+00:00 dave.miner at sun.com >>>>>>> >>>>>>> Installer takes care of releasing the target device before Target >>>>>>> Instantiation >>>>>>> phase is launched. Among other things, it >>>>>>> >>>>>>> * releases all swap devices created on target disk >>>>>>> * unmounts whatever is mounted on target disk >>>>>>> >>>>>>> For the latter, /etc/mnttab is read and if there is mounted >>>>>>> device which is part of >>>>>>> the target disk, installer tries to unmount it. >>>>>>> >>>>>>> The problem is after fix for Bugzilla bug 30 was integrated, UFS >>>>>>> filesystems are >>>>>>> mounted with '-o m' option which causes the filesystem being >>>>>>> mounted without making >>>>>>> entry in /etc/mnttab. Then mountpoints are hidden, installer >>>>>>> can't see those and >>>>>>> doesn't unmount them. >>>>>>> >>>>>>> That said, this explains UFS part of the problem (when 'dd' >>>>>>> workaround works), >>>>>>> but doesn't seems to be related to ZFS part of the issue, when >>>>>>> 'zpool import' workaround helped. >>>>>>> >>>>>>> *** (#6 of 9): 2008-11-11 16:25:09 GMT+00:00 jan.damborsky at sun.com >>>>>>> *** Last Edit: 2008-11-11 16:34:29 GMT+00:00 jan.damborsky at sun.com >>>>>>> >>>>>>> We should probably file leave this bug to resolve zpool create >>>>>>> not removing evidence of the >>>>>>> previous ufs fs, and file another one to chase down the other >>>>>>> issue(s?). >>>>>>> >>>>>>> Chris, if you run zbd -l on you virgin device, are you missing >>>>>>> any zfs properties? The reader >>>>>>> in GRUB pretty much gives up if things like the devid aren't set. >>>>>>> >>>>>>> *** (#7 of 9): 2008-11-11 19:30:56 GMT+00:00 >>>>>>> jan.setje-eilers at sun.com >>>>>>> >>>>>>> Concur that Chris' problem is different; the UFS superblock does >>>>>>> not exist in >>>>>>> the first 256kb attached to the bug. It appears as though >>>>>>> phys_path and devid >>>>>>> are present, although it's difficult to be sure. We should >>>>>>> probably see if we can >>>>>>> send a debug version of Grub to Chris, with installation >>>>>>> instructions, to see >>>>>>> why it seems unable to find the zfs. >>>>>>> >>>>>>> *** (#8 of 9): 2008-11-11 22:16:50 GMT+00:00 dan.mick at sun.com >>>>>>> >>>>>>> The root cause of 'UFS part' of this problem is in 'livecd code' >>>>>>> and is tracked by >>>>>>> following Bugzilla bug: >>>>>>> >>>>>>> 4675 Fix for bug 30 causes ZFS label to be mangled - ending up in >>>>>>> GRUB prompt after installing OpenSolaris >>>>>>> >>>>>>> Please feel free to use this bug (6769487) for tracking other >>>>>>> part(s) of the problem. >>>>>>> Resetting category to solaris/kernel/zfs and Status to 'Dispatched'. >>>>>>> >>>>>>> *** (#9 of 9): 2008-11-12 12:46:21 GMT+00:00 jan.damborsky at sun.com >>>>>>> >>>>>>> >>>>>>> === *Comments* >>>>>>> =============================================================== >>>>>>> Moved to public comments. >>>>>>> >>>>>>> *** (#1 of 6): 2008-11-10 17:04:10 GMT+00:00 jan.damborsky at sun.com >>>>>>> *** Last Edit: 2008-11-10 17:20:54 GMT+00:00 dave.miner at sun.com >>>>>>> >>>>>>> Same situation (without zfs) on: >>>>>>> White Box based on Intel DG33TL motherboard with ICH9R chipset, >>>>>>> 2Gb memory, 3 SATA drives, 1 SATA CD/DVD, Intel graphics. >>>>>>> >>>>>>> *** (#2 of 6): 2008-11-10 22:52:23 GMT+00:00 pawel.wojcik at sun.com >>>>>>> >>>>>>> Workaround #1 does not cause the system to boot properly on the >>>>>>> system I tried installing (that seems to be consistent with what >>>>>>> others are reporting in the opensolaris defect report), but >>>>>>> workaround #2 DOES. >>>>>>> >>>>>>> *** (#3 of 6): 2008-11-11 01:56:43 GMT+00:00 seth.goldberg at sun.com >>>>>>> *** Last Edit: 2008-11-11 03:41:48 GMT+00:00 seth.goldberg at sun.com >>>>>>> >>>>>>> I've reproduced this on a "virgin" disk, see SR record against >>>>>>> this bug, (had to purchase a new spindle as previous disk failed >>>>>>> and new disk removed supplier packaging was inserted into laptop >>>>>>> and then 2008.11 CD booted). >>>>>>> >>>>>>> After a discussion with Dan Mick on email data requested by dan >>>>>>> was capture root command from grub prompt: >>>>>>> >>>>>>> (hd0,0,a): Filesystem type is zfs, partition type 0xbf >>>>>>> >>>>>>> Also, can you boot from the CD and collect the first 256kb of the >>>>>>> disk, with >>>>>>> >>>>>>> dd if=<your s0 slice here> of=first.256kb bs=256k count=1 >>>>>>> >>>>>>> This is attached. >>>>>>> >>>>>>> *** (#4 of 6): 2008-11-11 10:46:29 GMT+00:00 >>>>>>> christopher.armes at sun.com >>>>>>> >>>>>>> Saw this bug on several machines today which I was helping to >>>>>>> install. One person did a reinstall and it worked fine the second >>>>>>> time as some reported. >>>>>>> >>>>>>> 2 other machines could use the workaround which Lin Ling pointed >>>>>>> us to with this bug. That did save a couple folks from having to >>>>>>> reinstall, so was very helpful. Thanks Lin! Of the installs of >>>>>>> people that installed to a hard drive (i.e., not within >>>>>>> VirtualBox), about 12 systems, we saw this on 3 machines, so >>>>>>> about 25% of the systems in this small sampling. >>>>>>> >>>>>>> *** (#5 of 6): 2008-11-12 09:58:01 GMT+00:00 alan.duboff at sun.com >>>>>>> >>>>>>> Moved to public comments. >>>>>>> >>>>>>> *** (#6 of 6): 2008-11-12 12:43:18 GMT+00:00 jan.damborsky at sun.com >>>>>>> *** Last Edit: 2008-11-12 12:46:43 GMT+00:00 jan.damborsky at sun.com >>>>>>> >>>>>>> >>>>>>> === *Evaluation* >>>>>>> ============================================================= >>>>>>> See Description. >>>>>>> >>>>>>> *** (#1 of 4): 2008-11-11 03:23:04 GMT+00:00 seth.goldberg at sun.com >>>>>>> >>>>>>> remove mislead evaluation. >>>>>>> >>>>>>> *** (#2 of 4): 2008-11-11 21:45:12 GMT+00:00 lin.ling at sun.com >>>>>>> *** Last Edit: 2008-11-11 23:16:07 GMT+00:00 lin.ling at sun.com >>>>>>> >>>>>>> What? No, read the public comments. The problem is that the UFS >>>>>>> filesystem is still mounted as the installer lays down the ZFS. >>>>>>> Then, on reboot, the UFS, as >>>>>>> it's syncing, writes its superblock back to the filesystem it >>>>>>> thinks it owns, >>>>>>> over the top of the now-ZFS-owned space. >>>>>>> >>>>>>> The installer must ensure that other filesystems are not mounted >>>>>>> on the slice >>>>>>> where it's creating the ZFS rpool. >>>>>>> >>>>>>> *** (#3 of 4): 2008-11-11 22:11:35 GMT+00:00 dan.mick at sun.com >>>>>>> >>>>>>> You are right. I misunderstood. >>>>>>> George Wilson just corrected me that 'zpool create' indeed clears >>>>>>> the space correctly: >>>>>>> >>>>>>> vdev_label_init() { >>>>>>> : >>>>>>> vp = zio_buf_alloc(sizeof (vdev_phys_t)); >>>>>>> bzero(vp, sizeof (vdev_phys_t)); >>>>>>> : >>>>>>> bzero(vb, sizeof (vdev_boot_header_t)); >>>>>>> : >>>>>>> } >>>>>>> >>>>>>> Thanks for the clarification. >>>>>>> >>>>>>> *** (#4 of 4): 2008-11-11 22:49:04 GMT+00:00 lin.ling at sun.com >>>>>>> >>>>>>> >>>>>>> === *Suggested Fix* >>>>>>> ========================================================== >>>>>>> >>>>>>> === *Workaround* >>>>>>> ============================================================= >>>>>>> [1] Boot LiveCD >>>>>>> $ pfexec su - >>>>>>> # zpool import -f rpool >>>>>>> >>>>>>> *** (#1 of 3): 2008-11-10 10:27:21 GMT+00:00 jan.damborsky at sun.com >>>>>>> >>>>>>> ZERO OUT The leftover UFS magic: >>>>>>> >>>>>>> For GNU dd: >>>>>>> dd if=/dev/zero bs=1 count=4 seek=9564 /dev/dsk/<SLICE> >>>>>>> >>>>>>> (e.g.: >>>>>>> dd if=/dev/zero bs=1 count=4 seek=9564 /dev/dsk/c4t0d0s0 >>>>>>> ) >>>>>>> >>>>>>> *** (#2 of 3): 2008-11-11 03:36:55 GMT+00:00 seth.goldberg at sun.com >>>>>>> >>>>>>> I did the following in dd to workaround around the issue: >>>>>>> >>>>>>> root at opensolaris:~# dd if=/dev/zero of=/dev/dsk/c1t0d0s0 bs=1 >>>>>>> count=4 seek=9564 >>>>>>> 4+0 records in >>>>>>> 4+0 records out >>>>>>> 4 bytes (4 B) copied, 0.0394095 s, 0.1 kB/s >>>>>>> root at opensolaris:~# >>>>>>> >>>>>>> *** (#3 of 3): 2008-11-11 19:07:04 GMT+00:00 mary.ding at sun.com >>>>>>> >>>>>>> >>>>>>> === *Justification* >>>>>>> ========================================================== >>>>>>> Priority changed from [] to [1-Very High] >>>>>>> Installed OpenSolaris 2008.11 doesn't boot >>>>>>> jan.damborsky at sun.com 2008-11-10 10:27:21 GMT >>>>>>> >>>>>>> *** (#1 of 1): 2008-11-10 10:27:21 GMT+00:00 jan.damborsky at sun.com >>>>>>> >>>>>>> >>>>>>> === *Additional Details* >>>>>>> ===================================================== >>>>>>> Targeted Release: Commit To Fix In Build: >>>>>>> Fixed In Build: Integrated In Build: Verified In >>>>>>> Build: See Also: 6769534 >>>>>>> Duplicate of: Hooks: >>>>>>> Hook1: Hook2: Hook3: >>>>>>> Hook4: Hook5: Hook6: Interest List: >>>>>>> dan.mick at sun.com, dave.miner at sun.com, david.comay at sun.com, >>>>>>> frank.batschulat at sun.com, kerberos-iteam at Sun.COM, >>>>>>> lin.ling at sun.com, nick.todd at sun.com, peter.dennis at sun.com, >>>>>>> plus1tb at sun.com, sdg at sun.com, si-bugs at sun.com, sst-prg at >>>>>>> sun.com, >>>>>>> tomas.hurka at sun.com >>>>>>> Program Management: New Defect >>>>>>> Root Cause: Is a Security Vulnerability?: No >>>>>>> Fix Affects Documentation: No >>>>>>> Fix Affects Localization: No >>>>>>> Reported by: >>>>>>> === *History* >>>>>>> ================================================================ >>>>>>> Date Submitted: 2008-11-10 10:27:21 GMT+00:00 >>>>>>> Submitted By: jan.damborsky at sun.com >>>>>>> >>>>>>> Status Changed Date Updated Updated By >>>>>>> 3-Accepted 2008-11-10 23:59:05 GMT+00:00 >>>>>>> lin.ling at sun.com >>>>>>> 5-Cause Known 2008-11-11 03:23:04 GMT+00:00 >>>>>>> seth.goldberg at sun.com >>>>>>> 1-Dispatched 2008-11-12 12:43:18 GMT+00:00 >>>>>>> jan.damborsky at sun.com >>>>>>> >>>>>>> >>>>>>> === *Solution* >>>>>>> =============================================================== >>>>>>> >>>>>>> >>>>>>> === *Service Request* >>>>>>> ======================================================== >>>>>>> ID: 1-493023606 >>>>>>> Customer: >>>>>>> Account Name: Sun Microsystems >>>>>>> Customer Contact: Customer Contact Role: >>>>>>> D-Development >>>>>>> Customer Contact Type: I-Internal (SMI) Customer >>>>>>> Impact: Critical >>>>>>> Functionality: Primary >>>>>>> Severity: 1 >>>>>>> Synopsis: Product Name: solaris >>>>>>> Product Release: osol_2008.11 >>>>>>> Product Build: Operating System: osol_2008.11 >>>>>>> Hardware: generic >>>>>>> Reference Number: Sun Contact: jan.damborsky at sun.com >>>>>>> Status: Open >>>>>>> Source: BugTraq2 >>>>>>> Reproducible: Submitted By: jan.damborsky at sun.com >>>>>>> Submitted Date: 2008-11-10 10:27:21 GMT+00:00 >>>>>>> Description: >>>>>>> >>>>>>> === *Service Request* >>>>>>> ======================================================== >>>>>>> ID: 1-493053806 >>>>>>> Customer: >>>>>>> Account Name: SUN MicroSystems >>>>>>> Customer Contact: Customer Contact Role: >>>>>>> D-Development >>>>>>> Customer Contact Type: I-Internal (SMI) Customer >>>>>>> Impact: Critical >>>>>>> Functionality: Primary >>>>>>> Severity: 1 >>>>>>> Synopsis: After installing 2008.11RC1b boot from hard >>>>>>> disk fails >>>>>>> Product Name: solaris >>>>>>> Product Release: osol_2008.11 >>>>>>> Product Build: Operating System: osol_2008.11 >>>>>>> Hardware: x86 >>>>>>> Reference Number: Sun Contact: >>>>>>> christopher.armes at sun.com >>>>>>> Status: Open >>>>>>> Source: BugTraq2 >>>>>>> Reproducible: Always >>>>>>> Submitted By: christopher.armes at sun.com >>>>>>> Submitted Date: 2008-11-10 12:54:24 GMT+00:00 >>>>>>> Description: Booting from the livecd and then selecting >>>>>>> install works fine upon reboot with either cd in and selecting >>>>>>> boot from hard disk or without cd allowing grub menu to boot, >>>>>>> causes boot to fail drops system to "grub>" prompt >>>>>>> >>>>>>> >>>>>>> === *Service Request* >>>>>>> ======================================================== >>>>>>> ID: 1-493177108 >>>>>>> Customer: >>>>>>> Account Name: SUN >>>>>>> Customer Contact: Customer Contact Role: >>>>>>> D-Development >>>>>>> Customer Contact Type: I-Internal (SMI) Customer >>>>>>> Impact: Critical >>>>>>> Functionality: Primary >>>>>>> Severity: 1 >>>>>>> Synopsis: Product Name: solaris >>>>>>> Product Release: osol_2008.11 >>>>>>> Product Build: osol_2008.11 >>>>>>> Operating System: osol_2008.11 >>>>>>> Hardware: amd >>>>>>> Reference Number: Sun Contact: >>>>>>> garrett.damore at sun.com >>>>>>> Status: Source: BugTraq2 >>>>>>> Reproducible: Submitted By: garrett.damore at sun.com >>>>>>> Submitted Date: 2008-11-10 20:16:41 GMT+00:00 >>>>>>> Description: I hit this when updating my Ultra 20 >>>>>>> (original model, not M2) from b77ish to OSOL 2008.11rc1b >>>>>>> >>>>>>> System has 1.5GB ram, SATA hard disk. >>>>>>> >>>>>>> >>>>>>> === *Service Request* >>>>>>> ======================================================== >>>>>>> ID: 1-493257401 >>>>>>> Customer: >>>>>>> Account Name: Sun Microsystems, Inc. >>>>>>> Customer Contact: Customer Contact Role: >>>>>>> D-Development >>>>>>> Customer Contact Type: I-Internal (SMI) Customer >>>>>>> Impact: Critical >>>>>>> Functionality: Primary >>>>>>> Severity: 1 >>>>>>> Synopsis: Product Name: solaris >>>>>>> Product Release: osol_2008.11 >>>>>>> Product Build: osol_2008.11 >>>>>>> Operating System: osol_2008.11 >>>>>>> Hardware: generic_ibm_compatible >>>>>>> Reference Number: Sun Contact: dana.myers at sun.com >>>>>>> Status: Open >>>>>>> Source: BugTraq2 >>>>>>> Reproducible: Submitted By: dana.myers at sun.com >>>>>>> Submitted Date: 2008-11-10 22:34:45 GMT+00:00 >>>>>>> Description: >>>>>>> >>>>>>> === *Service Request* >>>>>>> ======================================================== >>>>>>> ID: 1-493265801 >>>>>>> Customer: >>>>>>> Account Name: Sun Microsystems >>>>>>> Customer Contact: pawel.wojcik at sun.com >>>>>>> Customer Contact Role: D-Development >>>>>>> Customer Contact Type: I-Internal (SMI) Customer >>>>>>> Impact: Critical >>>>>>> Functionality: Primary >>>>>>> Severity: 1 >>>>>>> Synopsis: Product Name: solaris >>>>>>> Product Release: osol_2008.11 >>>>>>> Product Build: osol_2008.11 >>>>>>> Operating System: solaris >>>>>>> Hardware: intel >>>>>>> Reference Number: Sun Contact: pawel.wojcik at sun.com >>>>>>> Status: Source: BugTraq2 >>>>>>> Reproducible: Submitted By: pawel.wojcik at sun.com >>>>>>> Submitted Date: 2008-11-10 22:50:53 GMT+00:00 >>>>>>> Description: >>>>>>> >>>>>>> === *Activity* >>>>>>> =============================================================== >>>>>>> >>>>>>> >>>>>>> === *Multiple Release (MR) Cluster* - 0 >>>>>>> ====================================== >>>>>>> >>>>>>> >>>>>>> === *Escalations* >>>>>>> ============================================================ >>>>>>> >>>>>>> > > _______________________________________________ > caiman-discuss mailing list > caiman-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/caiman-discuss
