> I do not understand this as well the only possibilty > is that the > stop_timeout is excceded, but then the status of the > sczbt resource must > be stop_failed.
It is not the sczbt resource (in this case, smb1_zone) which has is in stop_failed. It is the underlying HAStoragePlus resources (in this case, smb1_zpool) which is failed. > > So we have to solve two questions: > what is blocking the zfs umount? > is the stop_timoutn exceeded? This must be reflected > in > /var/adm/messages of the node something like: > "Function: stop_sczbt - > Manual intervention needed for non-global zone". If > the stop_timout is > exceeded, I would try to raise it, just a try, but > question one needs to > be resolved first. I have instrumented this in DTrace (perhaps incorrectly or incompletely, so I am open to suggestions on changes). My DTrace script and complete dumps are available earlier in this thread, however the relevant (as far as I can see) portions are as follows: time:225428757327280 umount2-execname:zoneadmd mountpoint:/smb1_pool0/smb1_zone/root/dev flag:0 PID:4873 ParentPID:1 time:225428762957138 umount2-execname:zoneadmd return arg0:0 PID:4873 ParentPID:1 time:225428766242637 exec-execname:zoneadmd target:/bin/sh PID:7316 ParentPID:4873 time:225428833735669 exec-execname:ksh93 target:/usr/sbin/umount PID:7329 ParentPID:7316 time:225428837002545 exec-execname:umount target:/usr/lib/fs/zfs/umount PID:7329 ParentPID:7316 time:225428847206624 umount2-execname:zfs mountpoint:/smb1_pool0/smb1_zone/root flag:0 PID:7329 ParentPID:7316 time:225432170675815 umount2-execname:hastorageplus_po mountpoint:/smb1_pool0/smb1_zone/root flag:1024 PID:7450 ParentPID:1179 time:225435468361047 umount2-execname:zfs return arg0:0 PID:7329 ParentPID:7316 time:225435468446546 umount2-execname:hastorageplus_po return arg0:-1 PID:7450 ParentPID:1179 time:225435475257693 umount2-execname:zoneadmd mountpoint:/var/run/zones/smb1.zoneadmd_door flag:0 PID:4873 ParentPID:1 time:225435483475900 umount2-execname:zoneadmd return arg0:0 PID:4873 ParentPID:1 If I read this correctly, zoneadmd has finished stopping the named zone, and is unmounting various mountpoints within the zone's tree (/smb1_pool0/smb1_zone/root/dev at the beginning), Then calling (indirectly) /usr/lib/fs/zfs/umount which starts to umount2 /smb1_pool0/smb1_zone/root Before that call to umount2 that /usr/lib/fs/zfs/umount made returns, however, hastorageplus_po tries to umount2 the same mountpoint (well, hastorageplus_po is trying to export the pool, but part of that is to umount2 all mounted zfs mountpoints recursively first). Then the zfs umount2 completes with success, Then the hastorageplus_po umount2 fails (this makes sense, in a very limited scope, as the mountpoint is gone after the call is made and before it completes)... which puts the resource named smb1_zpool into failed state. What I don't understand is why smb1_zpool (the resource that should have called hastorageplus_po) is beginning the 'stop' sequence when the zfs umount2 hasn't completed yet. > > Detlef > > Tundra Slosek wrote: > >> Hi Tundra, > >> > >> The reasoning behind is that the root directory is > a > >> property of > >> Solaris, and placing something in her might have > some > >> impact. It could > >> have been, that the zoneadm halt tried to unmount > the > >> root fs without > >> success, because the gds is sitting on it. > >> > > > > As a recap - sometimes stop (no matter the source) > works correctly, sometimes it doesn't. When it > doesn't, it is because zoneadm issues a zfs umount > against the root directory and that is still > lingering when the underlying zpool's hastorageplus > tries to export the zpool. What I have noticed is > that when the timing is right (i.e. zfs umount > completes first), then the zpool export happens > without the 'FORCE' flag set, but when the timing is > wrong (and zfs umount has not yet completed), then > the 'FORCE' flag is set on the zpool export (and it > fails because the device is in use, and then > immediately after, the zfs umount completes). > > > > I do not understand why the hastorageplus begins > it's 'stop' before the zone is completely stopped - > what seems to happen is that the zone stops, and then > issues zoneadm request to unmount the zonepath; > however the gds returns to the rgm with success > before zoneadm is actually finished. > > > > > >> Anyway silly question. you do have adependency > >> between the sczbt > >> resource and the HAStoragePlus resource? > >> > > > > No question is silly. If I undestand the output of > clrs here, then the dependency is set. > > > > root at mltstore1:~# clrs show -v smb1_zone | grep > smb1 > > Resource: > smb1_zone > smb1_rg > pendencies: smb1_lhname > smb1_zpool > > Start_command: > > opt/SUNWsczone/sczbt/bin/start_sczbt -R smb1_zone -G > smb1_rg -P /smb1_pool0/parameters > > Stop_command: > > opt/SUNWsczone/sczbt/bin/stop_sczbt -R smb1_zone -G > smb1_rg -P /smb1_pool0/parameters > > Probe_command: > > opt/SUNWsczone/sczbt/bin/probe_sczbt -R smb1_zone -G > smb1_rg -P /smb1_pool0/parameters > > Network_resources_used: > smb1_lhname > >> Tundra Slosek wrote: > >> > >>>> Hi Tundra, > >>>> > >>>> One thing which you should never do is move the > >>>> parameter directory into > >>>> the root file system for the zone. this is what > >>>> > >> might > >> > >>>> cause the > >>>> headache, because the sczbt resource accesses > the > >>>> parameter directory > >>>> and calling zoneadm halt which tries to remove > >>>> > >> the > >> > >>> mount and this might > >>> > >>>> not work. > >>>> > >>>> I would suggest to move the parameters directory > >>>> > >> to: > >> > >>>> /smb1_pool0/parameters > >>>> > >>>> > >>> I'm not sure I understand why a file open in > >>> > >> /smb1_pool0/smb1_zone/parameters/ would prevent > zfs > >> unmounting of /smb1_pool0/smb1_zone/root, however > >> it's easy enough to test, I don't see any harm in > the > >> suggested change and I remain open to the > possibility > >> that there is something fundamental I'm > >> misunderstanding. > >> > >>> Done, (created the directory above, copied the > >>> > >> existing contents of parameters and changed the > clrs > >> Start_command, Stop_command and Probe_command to > >> point at /smb1_pool0/parameters instead of > >> /smb1_pool0/smb1_zone/parameters) however the > exact > >> same behavior exists - i.e. overlap between zfs > >> unmount of /smb1_pool0/smb1_zone/root and > >> hastorageplus attempting to export the smb1_pool0 > >> zpool. DTrace log available as per prior efforts, > if > >> anyone thinks it will be helpful, however it > doesn't > >> seem different to me. > >> > >>> > >>> > >> -- > >> > >> > ****************************************************** > >> *********************** > >> Detlef Ulherr > >> Staff Engineer Tel: (++49 > 6103) > >> 752-248 > >> Availability Engineering Fax: (++49 6103) > 752-167 > >> Sun Microsystems GmbH > >> Amperestr. 6 > >> mailto:detlef.ulherr at sun.com > >> http://www.sun.de/ > >> > ****************************************************** > >> ****** > >> > >> Sitz der Gesellschaft: > >> Sun Microsystems GmbH, Sonnenallee 1, D-85551 > >> Kirchheim-Heimstetten > >> Amtsgericht M?nchen: HRB 161028 > >> Gesch?ftsf?hrer: Thomas Schr?der, Wolfgang Engels, > >> Wolf Frenkel > >> Vorsitzender des Aufsichtsrates: Martin H?ring > >> > >> > ****************************************************** > >> *********************** > >> > >> > >> _______________________________________________ > >> ha-clusters-discuss mailing list > >> ha-clusters-discuss at opensolaris.org > >> > http://mail.opensolaris.org/mailman/listinfo/ha-cluste > >> rs-discuss > >> > > -- > > ****************************************************** > *********************** > Detlef Ulherr > Staff Engineer Tel: (++49 6103) > 752-248 > Availability Engineering Fax: (++49 6103) 752-167 > Sun Microsystems GmbH > Amperestr. 6 > mailto:detlef.ulherr at sun.com > http://www.sun.de/ > ****************************************************** > ****** > > Sitz der Gesellschaft: > Sun Microsystems GmbH, Sonnenallee 1, D-85551 > Kirchheim-Heimstetten > Amtsgericht M?nchen: HRB 161028 > Gesch?ftsf?hrer: Thomas Schr?der, Wolfgang Engels, > Wolf Frenkel > Vorsitzender des Aufsichtsrates: Martin H?ring > > ****************************************************** > *********************** > > > _______________________________________________ > ha-clusters-discuss mailing list > ha-clusters-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/ha-cluste > rs-discuss -- This message posted from opensolaris.org