Hi Tony,

I thought that too, except that the d30 mirror looks ok, it doesn't say 
resyncing when he runs the script after the jumpstart install completes. 
The rpc services not starting is a known bug, right? And it is a warning 
that happens during when we are running off the miniroot so...

but, the rebooting with the same issue of fsck failing seems like it 
could point to a UFS issue. I would think SVM services would be online 
by that time.

But, as you said boot -m verbose should give us a bit more details.

sarah
*****

Tony Nguyen wrote:

> Sounds like svm services did not get started on-time. However, they've 
> been started by the time you manually run fsck.  'boot -m verbose' 
> will give us more details.
>
> -tony
>
> Sarah J. Jelinek wrote:
>
>> Hi Richard,
>>
>> Apologies for the delay in getting back to you. I am going to cross 
>> post this to the ufs-discuss email list as well.
>>
>> It seems like based on the symptoms you are seeing that for some 
>> reason the data UFS is getting during the fsck during boot is bad, in 
>> your examples of / and /usr.  And, that your subsequent fsck of that 
>> filesystem, based on your subsequent correspondence with Sanjay 
>> Nadkarni nets no failures as shown in this email to Sanjay:
>>
>> # fsck -F ufs /dev/md/rdsk/d30
>> ** /dev/md/rdsk/d30
>> ** Last Mounted on /
>> ** Phase 1 - Check Blocks and Sizes
>> ** Phase 2 - Check Pathnames
>> ** Phase 3 - Check Connectivity
>> ** Phase 4 - Check Reference Counts
>> ** Phase 5 - Check Cyl groups
>> 216878 files, 4949179 used, 3627460 free (386204 frags, 405157 
>> blocks, 4.5% fragmentation)
>>
>> So, it looks like to me that this is a potential mirror resync issue. 
>> Although, your original email shows the device in question on your 
>> first test system, d30, looks ok based on your metastat output. This 
>> could be a UFS logging issue I suppose as well.
>> Basically, it looks like something is making the system think the 
>> filesystem in question needs a check, it forces you in to system 
>> maintenance mode, then when you run fsck it all looks ok, so somehow 
>> it clears itself up.
>> fs-usr is the script failing in both scenarios you sent data about, 
>> and in svcs does remount the filesystem read/write from read only 
>> during boot which is why it is doing fsck on these filesystems.
>>
>> So, I need a few things from you if possible to try to help me see 
>> where this is failing:
>>
>> 1. Can you boot as follows: 'boot -m verbose' ,which should give me 
>> more data about the SMv services that are running at the time of the 
>> failure.  You will have to halt your system to do this since it 
>> doesn't look like reboot supports these arguments. However, you did 
>> say in your subsequent emails to Sanjay that this does happen on 
>> second reboot as well. The only concern I have with halting your 
>> system, and then booting may quiesce the filesytem enough to mask 
>> this issue. But it is worth a try.
>>
>> 2. Can you modify your jumpstart install to have a single node mirror 
>> during install, see if this problems continues to happen or not. Then 
>> after the install, attach the other submirror. This would give me 
>> some data regarding where this might be happening. Trying to isolate 
>> mirror resync issues from UFS issues.
>>
>> 3. Had you seen this issue before b16? Just trying to narrow down the 
>> putbacks to solaris to look at.
>>
>> thanks,
>> sarah
>> ******
>>
>>
>>> Hi,
>>> I am seeing a problem with snv_16 and snv_18 that on
>>> reboot the mirrored file systems fail fsck. This
>>> problem is most noticable on the first reboot after
>>> my jumpstart builds.
>>> My two fcal disks are formated :-
>>> install_type    initial_install
>>> system_type     standalone
>>> partitioning    explicit
>>> filesys         mirror:d10 c0t0d0s0 c1t4d0s0 256
>>>       /       logging
>>> filesys         mirror:d20 c0t0d0s3 c1t4d0s3 4096
>>>      /var    logging
>>> filesys         mirror:d30 c0t0d0s4 c1t4d0s4 4096
>>>      /usr    logging
>>> filesys         mirror:d40 c0t0d0s5 c1t4d0s5 1536
>>>      /opt    logging
>>> filesys         mirror:d50 c0t0d0s7 c1t4d0s7 15360
>>>     /tmp2   logging
>>> filesys         mirror:d60 c0t0d0s1 c1t4d0s1 free
>>> swap
>>> metadb          c0t0d0s6 size 8192 count 4
>>> metadb          c1t4d0s6 size 8192 count 4
>>> cluster         SUNWCXall
>>> locale          en_GB
>>>
>>> The install works file and I have put a metastat at
>>> the end of my finish script and all looks ok:-
>>> /sbin/metastat
>>> metastat: brscs02:        system/metainit:default
>>>        system/mdmonitor:default
>>> network/rpc/meta:default: service(s) not
>>> e(s) not online in SMF
>>>
>>> d60: Mirror
>>>    Submirror 0: d61
>>>      State: Okay            Submirror 1: d62
>>>      State: Okay            Pass: 1
>>>    Read option: roundrobin (default)
>>>    Write option: parallel (default)
>>>    Size: 19182960 blocks (9.1 GB)
>>>
>>> d61: Submirror of d60
>>>    State: Okay            Size: 19182960 blocks (9.1 GB)
>>>    Stripe 0:
>>> Device     Start Block  Dbase        State
>>>  State Reloc Hot Spare
>>> c0t0d0s1          0     No            Okay
>>>   Okay   Yes
>>>
>>> d62: Submirror of d60
>>>    State: Okay            Size: 19182960 blocks (9.1 GB)
>>>    Stripe 0:
>>> Device     Start Block  Dbase        State
>>>  State Reloc Hot Spare
>>> c1t4d0s1          0     No            Okay
>>>   Okay   Yes
>>>
>>> d50: Mirror
>>>    Submirror 0: d51
>>>      State: Okay            Submirror 1: d52
>>>      State: Okay            Pass: 1
>>>    Read option: roundrobin (default)
>>>    Write option: parallel (default)
>>>    Size: 31458321 blocks (15 GB)
>>>
>>> d51: Submirror of d50
>>>    State: Okay            Size: 31458321 blocks (15 GB)
>>>    Stripe 0:
>>> Device     Start Block  Dbase        State
>>>  State Reloc Hot Spare
>>> c0t0d0s7          0     No            Okay
>>>   Okay   Yes
>>>
>>> d52: Submirror of d50
>>>    State: Okay            Size: 31458321 blocks (15 GB)
>>>    Stripe 0:
>>> Device     Start Block  Dbase        State
>>>  State Reloc Hot Spare
>>> c1t4d0s7          0     No            Okay
>>>   Okay   Yes
>>>
>>> d40: Mirror
>>>    Submirror 0: d41
>>>      State: Okay            Submirror 1: d42
>>>      State: Okay            Pass: 1
>>>    Read option: roundrobin (default)
>>>    Write option: parallel (default)
>>>    Size: 3146121 blocks (1.5 GB)
>>>
>>> d41: Submirror of d40
>>>    State: Okay            Size: 3146121 blocks (1.5 GB)
>>>    Stripe 0:
>>> Device     Start Block  Dbase        State
>>>  State Reloc Hot Spare
>>> c0t0d0s5          0     No            Okay
>>>   Okay   Yes
>>>
>>> d42: Submirror of d40
>>>    State: Okay            Size: 3146121 blocks (1.5 GB)
>>>    Stripe 0:
>>> Device     Start Block  Dbase        State
>>>  State Reloc Hot Spare
>>> c1t4d0s5          0     No            Okay
>>>   Okay   Yes
>>>
>>> d30: Mirror
>>>    Submirror 0: d31
>>>      State: Okay            Submirror 1: d32
>>>      State: Okay            Pass: 1
>>>    Read option: roundrobin (default)
>>>    Write option: parallel (default)
>>>    Size: 8389656 blocks (4.0 GB)
>>>
>>> d31: Submirror of d30
>>>    State: Okay            Size: 8389656 blocks (4.0 GB)
>>>    Stripe 0:
>>> Device     Start Block  Dbase        State
>>>  State Reloc Hot Spare
>>> c0t0d0s4          0     No            Okay
>>>   Okay   Yes
>>>
>>> d32: Submirror of d30
>>>    State: Okay            Size: 8389656 blocks (4.0 GB)
>>>    Stripe 0:
>>> Device     Start Block  Dbase        State
>>>  State Reloc Hot Spare
>>> c1t4d0s4          0     No            Okay
>>>   Okay   Yes
>>>
>>> d20: Mirror
>>>    Submirror 0: d21
>>>      State: Okay            Submirror 1: d22
>>>      State: Okay            Pass: 1
>>>    Read option: roundrobin (default)
>>>    Write option: parallel (default)
>>>    Size: 8389656 blocks (4.0 GB)
>>>
>>> d21: Submirror of d20
>>>    State: Okay            Size: 8389656 blocks (4.0 GB)
>>>    Stripe 0:
>>> Device     Start Block  Dbase        State
>>>  State Reloc Hot Spare
>>> c0t0d0s3          0     No            Okay
>>>   Okay   Yes
>>>
>>> d22: Submirror of d20
>>>    State: Okay            Size: 8389656 blocks (4.0 GB)
>>>    Stripe 0:
>>> Device     Start Block  Dbase        State
>>>  State Reloc Hot Spare
>>> c1t4d0s3          0     No            Okay
>>>   Okay   Yes
>>>
>>> d10: Mirror
>>>    Submirror 0: d11
>>>      State: Okay            Submirror 1: d12
>>>      State: Okay            Pass: 1
>>>    Read option: roundrobin (default)
>>>    Write option: parallel (default)
>>>    Size: 525798 blocks (256 MB)
>>>
>>> d11: Submirror of d10
>>>    State: Okay            Size: 525798 blocks (256 MB)
>>>    Stripe 0:
>>> Device     Start Block  Dbase        State
>>>  State Reloc Hot Spare
>>> c0t0d0s0          0     No            Okay
>>>   Okay   Yes
>>>
>>> d12: Submirror of d10
>>>    State: Okay            Size: 525798 blocks (256 MB)
>>>    Stripe 0:
>>> Device     Start Block  Dbase        State
>>>  State Reloc Hot Spare
>>> c1t4d0s0          0     No            Okay
>>>   Okay   Yes
>>>
>>> Device Relocation Information:
>>> Device   Reloc  Device ID
>>> c1t4d0   Yes    id1,ssd at n20000020375c02d7
>>> c0t0d0   Yes    id1,ssd at n2000002037a13604
>>>
>>>
>>> But when it reboots it sometimes fails:-
>>> Finish script E3500+login.sh execution completed.
>>>
>>> The begin script log 'begin.log'
>>> is located in /var/sadm/system/logs after reboot.
>>>
>>> The finish script log 'finish.log'
>>> is located in /var/sadm/system/logs after reboot.
>>>
>>> syncing file systems... done
>>> rebooting...
>>> Resetting... ttya initialized
>>> Using POST's System Configuration
>>> Setting up memory
>>> fhc ac simm-status environment sram flashprom
>>> SUNW,UltraSPARC-II Probing UPA Slot at 2,0   sbus fhc ac environment
>>> flashprom eeprom sbus-speed counter-timer Probing UPA Slot at 3,0   
>>> sbus counter-timer Probing /sbus at 2,0 at d,0  SUNW,socal sf ssd sf 
>>> ssd Probing /sbus at 2,0 at 1,0  QLGC,isp sd st Probing /sbus at 2,0 at 
>>> 2,0  Nothing there
>>> Probing /sbus at 3,0 at 3,0  SUNW,hme SUNW,fas sd st Probing /sbus at 3,0 
>>> at 0,0  network 5-slot Sun Enterprise E3500, No Keyboard
>>> OpenBoot 3.2.30, 2048 MB memory installed, Serial
>>> #11240214.
>>> Copyright 2002 Sun Microsystems, Inc.  All rights
>>> reserved
>>> Ethernet address 8:0:20:ab:83:16, Host ID: 80ab8316.
>>>
>>>
>>>
>>> Rebooting with command: boot
>>>
>>> Port#1 received soc-status=14 Port#0 received soc-status=14 loop 0 
>>> is ONLINE
>>> Boot device: disk  File and args: Loading ufs-file-system package 
>>> 1.4 04 Aug 1995
>>> 13:02:54. FCode UFS Reader 1.12 00/07/17 15:48:16. Loading: 
>>> /platform/SUNW,Ultra-Enterprise/ufsboot
>>> Loading: /platform/sun4u/ufsboot
>>> SunOS Release 5.11 Version snv_16 64-bit
>>> Copyright 1983-2005 Sun Microsystems, Inc.  All
>>> rights reserved.
>>> Use is subject to license terms.
>>> SUNW,sbus-gem0: Using Gigabit SERDES Interface
>>> SUNW,sbus-gem0: Auto-Negotiated 1000 Mbps Full-Duplex
>>> Link Up
>>> Hostname: brscs02
>>> The / file system (/dev/md/rdsk/d10) is being
>>> checked.
>>> The /usr file system (/dev/md/rdsk/d30) is being
>>> checked.
>>>
>>> WARNING - Unable to repair the /usr filesystem. Run
>>> fsck
>>> manually (fsck -F ufs /dev/md/rdsk/d30).
>>>
>>> Jul 26 17:50:14 svc.startd[7]:
>>> svc:/system/filesystem/usr:default: Method
>>> "/lib/svc/method/fs-usr" failed with exit status 95.
>>> [ system/filesystem/usr:default failed fatally (see
>>> 'svcs -x' for details) ]
>>> Requesting System Maintenance Mode
>>> (See /lib/svc/share/README for more information.)
>>> Console login service(s) cannot run
>>>
>>> Root password for system maintenance (control-d to
>>> bypass):
>>>
>>> Sometimes the reboot works but it checks the
>>> filesystems:-
>>>
>>> SUNW,sbus-gem0: Auto-Negotiated 1000 Mbps Full-Duplex
>>> Link Up
>>>
>>> Hostname: brscs02
>>>
>>> The / file system (/dev/md/rdsk/d10) is being
>>> checked.
>>>
>>> Configuring devices.
>>>
>>> Loading smf(5) service descriptions:  
>>>
>>> Other times another reboot will also fail.
>>>
>>>
>>> I have seen the problem on a Ultra 60 also with two
>>> onboard 9GB scsi drives
>>>
>>>
>>> Solaris 10 GA does not seem to have this problem.
>>>
>>> Cheers
>>> Richard.
>>
>>
>> This message posted from opensolaris.org
>> _______________________________________________
>> lvm-discuss mailing list
>> lvm-discuss at opensolaris.org
>
>

Reply via email to