Hi Tony, I thought that too, except that the d30 mirror looks ok, it doesn't say resyncing when he runs the script after the jumpstart install completes. The rpc services not starting is a known bug, right? And it is a warning that happens during when we are running off the miniroot so...
but, the rebooting with the same issue of fsck failing seems like it could point to a UFS issue. I would think SVM services would be online by that time. But, as you said boot -m verbose should give us a bit more details. sarah ***** Tony Nguyen wrote: > Sounds like svm services did not get started on-time. However, they've > been started by the time you manually run fsck. 'boot -m verbose' > will give us more details. > > -tony > > Sarah J. Jelinek wrote: > >> Hi Richard, >> >> Apologies for the delay in getting back to you. I am going to cross >> post this to the ufs-discuss email list as well. >> >> It seems like based on the symptoms you are seeing that for some >> reason the data UFS is getting during the fsck during boot is bad, in >> your examples of / and /usr. And, that your subsequent fsck of that >> filesystem, based on your subsequent correspondence with Sanjay >> Nadkarni nets no failures as shown in this email to Sanjay: >> >> # fsck -F ufs /dev/md/rdsk/d30 >> ** /dev/md/rdsk/d30 >> ** Last Mounted on / >> ** Phase 1 - Check Blocks and Sizes >> ** Phase 2 - Check Pathnames >> ** Phase 3 - Check Connectivity >> ** Phase 4 - Check Reference Counts >> ** Phase 5 - Check Cyl groups >> 216878 files, 4949179 used, 3627460 free (386204 frags, 405157 >> blocks, 4.5% fragmentation) >> >> So, it looks like to me that this is a potential mirror resync issue. >> Although, your original email shows the device in question on your >> first test system, d30, looks ok based on your metastat output. This >> could be a UFS logging issue I suppose as well. >> Basically, it looks like something is making the system think the >> filesystem in question needs a check, it forces you in to system >> maintenance mode, then when you run fsck it all looks ok, so somehow >> it clears itself up. >> fs-usr is the script failing in both scenarios you sent data about, >> and in svcs does remount the filesystem read/write from read only >> during boot which is why it is doing fsck on these filesystems. >> >> So, I need a few things from you if possible to try to help me see >> where this is failing: >> >> 1. Can you boot as follows: 'boot -m verbose' ,which should give me >> more data about the SMv services that are running at the time of the >> failure. You will have to halt your system to do this since it >> doesn't look like reboot supports these arguments. However, you did >> say in your subsequent emails to Sanjay that this does happen on >> second reboot as well. The only concern I have with halting your >> system, and then booting may quiesce the filesytem enough to mask >> this issue. But it is worth a try. >> >> 2. Can you modify your jumpstart install to have a single node mirror >> during install, see if this problems continues to happen or not. Then >> after the install, attach the other submirror. This would give me >> some data regarding where this might be happening. Trying to isolate >> mirror resync issues from UFS issues. >> >> 3. Had you seen this issue before b16? Just trying to narrow down the >> putbacks to solaris to look at. >> >> thanks, >> sarah >> ****** >> >> >>> Hi, >>> I am seeing a problem with snv_16 and snv_18 that on >>> reboot the mirrored file systems fail fsck. This >>> problem is most noticable on the first reboot after >>> my jumpstart builds. >>> My two fcal disks are formated :- >>> install_type initial_install >>> system_type standalone >>> partitioning explicit >>> filesys mirror:d10 c0t0d0s0 c1t4d0s0 256 >>> / logging >>> filesys mirror:d20 c0t0d0s3 c1t4d0s3 4096 >>> /var logging >>> filesys mirror:d30 c0t0d0s4 c1t4d0s4 4096 >>> /usr logging >>> filesys mirror:d40 c0t0d0s5 c1t4d0s5 1536 >>> /opt logging >>> filesys mirror:d50 c0t0d0s7 c1t4d0s7 15360 >>> /tmp2 logging >>> filesys mirror:d60 c0t0d0s1 c1t4d0s1 free >>> swap >>> metadb c0t0d0s6 size 8192 count 4 >>> metadb c1t4d0s6 size 8192 count 4 >>> cluster SUNWCXall >>> locale en_GB >>> >>> The install works file and I have put a metastat at >>> the end of my finish script and all looks ok:- >>> /sbin/metastat >>> metastat: brscs02: system/metainit:default >>> system/mdmonitor:default >>> network/rpc/meta:default: service(s) not >>> e(s) not online in SMF >>> >>> d60: Mirror >>> Submirror 0: d61 >>> State: Okay Submirror 1: d62 >>> State: Okay Pass: 1 >>> Read option: roundrobin (default) >>> Write option: parallel (default) >>> Size: 19182960 blocks (9.1 GB) >>> >>> d61: Submirror of d60 >>> State: Okay Size: 19182960 blocks (9.1 GB) >>> Stripe 0: >>> Device Start Block Dbase State >>> State Reloc Hot Spare >>> c0t0d0s1 0 No Okay >>> Okay Yes >>> >>> d62: Submirror of d60 >>> State: Okay Size: 19182960 blocks (9.1 GB) >>> Stripe 0: >>> Device Start Block Dbase State >>> State Reloc Hot Spare >>> c1t4d0s1 0 No Okay >>> Okay Yes >>> >>> d50: Mirror >>> Submirror 0: d51 >>> State: Okay Submirror 1: d52 >>> State: Okay Pass: 1 >>> Read option: roundrobin (default) >>> Write option: parallel (default) >>> Size: 31458321 blocks (15 GB) >>> >>> d51: Submirror of d50 >>> State: Okay Size: 31458321 blocks (15 GB) >>> Stripe 0: >>> Device Start Block Dbase State >>> State Reloc Hot Spare >>> c0t0d0s7 0 No Okay >>> Okay Yes >>> >>> d52: Submirror of d50 >>> State: Okay Size: 31458321 blocks (15 GB) >>> Stripe 0: >>> Device Start Block Dbase State >>> State Reloc Hot Spare >>> c1t4d0s7 0 No Okay >>> Okay Yes >>> >>> d40: Mirror >>> Submirror 0: d41 >>> State: Okay Submirror 1: d42 >>> State: Okay Pass: 1 >>> Read option: roundrobin (default) >>> Write option: parallel (default) >>> Size: 3146121 blocks (1.5 GB) >>> >>> d41: Submirror of d40 >>> State: Okay Size: 3146121 blocks (1.5 GB) >>> Stripe 0: >>> Device Start Block Dbase State >>> State Reloc Hot Spare >>> c0t0d0s5 0 No Okay >>> Okay Yes >>> >>> d42: Submirror of d40 >>> State: Okay Size: 3146121 blocks (1.5 GB) >>> Stripe 0: >>> Device Start Block Dbase State >>> State Reloc Hot Spare >>> c1t4d0s5 0 No Okay >>> Okay Yes >>> >>> d30: Mirror >>> Submirror 0: d31 >>> State: Okay Submirror 1: d32 >>> State: Okay Pass: 1 >>> Read option: roundrobin (default) >>> Write option: parallel (default) >>> Size: 8389656 blocks (4.0 GB) >>> >>> d31: Submirror of d30 >>> State: Okay Size: 8389656 blocks (4.0 GB) >>> Stripe 0: >>> Device Start Block Dbase State >>> State Reloc Hot Spare >>> c0t0d0s4 0 No Okay >>> Okay Yes >>> >>> d32: Submirror of d30 >>> State: Okay Size: 8389656 blocks (4.0 GB) >>> Stripe 0: >>> Device Start Block Dbase State >>> State Reloc Hot Spare >>> c1t4d0s4 0 No Okay >>> Okay Yes >>> >>> d20: Mirror >>> Submirror 0: d21 >>> State: Okay Submirror 1: d22 >>> State: Okay Pass: 1 >>> Read option: roundrobin (default) >>> Write option: parallel (default) >>> Size: 8389656 blocks (4.0 GB) >>> >>> d21: Submirror of d20 >>> State: Okay Size: 8389656 blocks (4.0 GB) >>> Stripe 0: >>> Device Start Block Dbase State >>> State Reloc Hot Spare >>> c0t0d0s3 0 No Okay >>> Okay Yes >>> >>> d22: Submirror of d20 >>> State: Okay Size: 8389656 blocks (4.0 GB) >>> Stripe 0: >>> Device Start Block Dbase State >>> State Reloc Hot Spare >>> c1t4d0s3 0 No Okay >>> Okay Yes >>> >>> d10: Mirror >>> Submirror 0: d11 >>> State: Okay Submirror 1: d12 >>> State: Okay Pass: 1 >>> Read option: roundrobin (default) >>> Write option: parallel (default) >>> Size: 525798 blocks (256 MB) >>> >>> d11: Submirror of d10 >>> State: Okay Size: 525798 blocks (256 MB) >>> Stripe 0: >>> Device Start Block Dbase State >>> State Reloc Hot Spare >>> c0t0d0s0 0 No Okay >>> Okay Yes >>> >>> d12: Submirror of d10 >>> State: Okay Size: 525798 blocks (256 MB) >>> Stripe 0: >>> Device Start Block Dbase State >>> State Reloc Hot Spare >>> c1t4d0s0 0 No Okay >>> Okay Yes >>> >>> Device Relocation Information: >>> Device Reloc Device ID >>> c1t4d0 Yes id1,ssd at n20000020375c02d7 >>> c0t0d0 Yes id1,ssd at n2000002037a13604 >>> >>> >>> But when it reboots it sometimes fails:- >>> Finish script E3500+login.sh execution completed. >>> >>> The begin script log 'begin.log' >>> is located in /var/sadm/system/logs after reboot. >>> >>> The finish script log 'finish.log' >>> is located in /var/sadm/system/logs after reboot. >>> >>> syncing file systems... done >>> rebooting... >>> Resetting... ttya initialized >>> Using POST's System Configuration >>> Setting up memory >>> fhc ac simm-status environment sram flashprom >>> SUNW,UltraSPARC-II Probing UPA Slot at 2,0 sbus fhc ac environment >>> flashprom eeprom sbus-speed counter-timer Probing UPA Slot at 3,0 >>> sbus counter-timer Probing /sbus at 2,0 at d,0 SUNW,socal sf ssd sf >>> ssd Probing /sbus at 2,0 at 1,0 QLGC,isp sd st Probing /sbus at 2,0 at >>> 2,0 Nothing there >>> Probing /sbus at 3,0 at 3,0 SUNW,hme SUNW,fas sd st Probing /sbus at 3,0 >>> at 0,0 network 5-slot Sun Enterprise E3500, No Keyboard >>> OpenBoot 3.2.30, 2048 MB memory installed, Serial >>> #11240214. >>> Copyright 2002 Sun Microsystems, Inc. All rights >>> reserved >>> Ethernet address 8:0:20:ab:83:16, Host ID: 80ab8316. >>> >>> >>> >>> Rebooting with command: boot >>> >>> Port#1 received soc-status=14 Port#0 received soc-status=14 loop 0 >>> is ONLINE >>> Boot device: disk File and args: Loading ufs-file-system package >>> 1.4 04 Aug 1995 >>> 13:02:54. FCode UFS Reader 1.12 00/07/17 15:48:16. Loading: >>> /platform/SUNW,Ultra-Enterprise/ufsboot >>> Loading: /platform/sun4u/ufsboot >>> SunOS Release 5.11 Version snv_16 64-bit >>> Copyright 1983-2005 Sun Microsystems, Inc. All >>> rights reserved. >>> Use is subject to license terms. >>> SUNW,sbus-gem0: Using Gigabit SERDES Interface >>> SUNW,sbus-gem0: Auto-Negotiated 1000 Mbps Full-Duplex >>> Link Up >>> Hostname: brscs02 >>> The / file system (/dev/md/rdsk/d10) is being >>> checked. >>> The /usr file system (/dev/md/rdsk/d30) is being >>> checked. >>> >>> WARNING - Unable to repair the /usr filesystem. Run >>> fsck >>> manually (fsck -F ufs /dev/md/rdsk/d30). >>> >>> Jul 26 17:50:14 svc.startd[7]: >>> svc:/system/filesystem/usr:default: Method >>> "/lib/svc/method/fs-usr" failed with exit status 95. >>> [ system/filesystem/usr:default failed fatally (see >>> 'svcs -x' for details) ] >>> Requesting System Maintenance Mode >>> (See /lib/svc/share/README for more information.) >>> Console login service(s) cannot run >>> >>> Root password for system maintenance (control-d to >>> bypass): >>> >>> Sometimes the reboot works but it checks the >>> filesystems:- >>> >>> SUNW,sbus-gem0: Auto-Negotiated 1000 Mbps Full-Duplex >>> Link Up >>> >>> Hostname: brscs02 >>> >>> The / file system (/dev/md/rdsk/d10) is being >>> checked. >>> >>> Configuring devices. >>> >>> Loading smf(5) service descriptions: >>> >>> Other times another reboot will also fail. >>> >>> >>> I have seen the problem on a Ultra 60 also with two >>> onboard 9GB scsi drives >>> >>> >>> Solaris 10 GA does not seem to have this problem. >>> >>> Cheers >>> Richard. >> >> >> This message posted from opensolaris.org >> _______________________________________________ >> lvm-discuss mailing list >> lvm-discuss at opensolaris.org > >