[ha-clusters-discuss] HAStoragePlus resource with a zone on top, unable to migrate

Tundra Slosek Mon, 14 Dec 2009 10:23:01 PST

> One causes the other, I assume the processes of the
> sczbt get killed 
> because they are getting too close to theit
> stop_timeout. So question 
> number one is the real root cause. I assume in your
> version of the sccbt 
> is a pmfadm -s <name> KILL, and the clearzone script
> is started with 
> hatimerun and not with pmfadm.


Do these two grep searches confirm your question? If not, let me know where I 
can look.  

root at mltproc1:~# grep -i pmfadm /opt/SUNWsczone/sczbt/bin/*
/opt/SUNWsczone/sczbt/bin/functions:        ${PMFADM} -s 
${RESOURCEGROUP},${RESOURCE},0.svc
/opt/SUNWsczone/sczbt/bin/functions:        ${PMFADM} -s 
${RESOURCEGROUP},${RESOURCE},0.svc KILL 2> /dev/null

root at mltproc1:~# grep -i hatimerun /opt/SUNWsczone/sczbt/bin/functions 
                        /usr/cluster/bin/hatimerun -t ${CLEAR_STOP_TIMEOUT} 
/opt/SUNWsczone/sczbt/bin/clear_zone ${Zonepath} ${RESOURCEGROUP} ${RESOURCE} 
>>${LOGFILE}


> 
> So my suggestion would be to increase the stop_timout
> of the zones boot 
> resource and see if it was just too small.
> 
> The root cause is that the stop of the zone takes
> longer than expected.

The following entry is recorded in syslog at the same time as the sample I 
detailed:

Dec 14 09:55:57 mltproc1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] 
method <gds_svc_stop> completed successfully for resource <smb1_zone>, resource 
group <smb1_rg>, node <mltproc1>, time used: 4% of timeout <300 seconds>

I read this to mean that the resource smb1_zone reached stop successfully 
within 4% of 300 seconds. Given this, what STOP_TIMEOUT value do you think 
would make a difference? I have some entries in the log of stop successful with 
time used of 63% and 80% - is there some other variable of 'how much of 
STOP_TIMEOUT to use before failing' that I'm not seeing?

> 
> Detlef
> 
> Tundra Slosek wrote:
> >> I do not understand this as well the only
> possibilty
> >> is that the 
> >> stop_timeout is excceded, but then the status of
> the
> >> sczbt resource must 
> >> be stop_failed.
> >>     
> >
> > It is not the sczbt resource (in this case,
> smb1_zone) which has is in stop_failed. It is the
> underlying HAStoragePlus resources (in this case,
> smb1_zpool) which is failed.
> >
> >   
> >> So we have to solve two questions:
> >> what is blocking the zfs umount?
> >> is the stop_timoutn exceeded? This must be
> reflected
> >> in 
> >> /var/adm/messages of the node something like:
> >> "Function: stop_sczbt - 
> >> Manual intervention needed for non-global zone".
> If
> >> the stop_timout is 
> >> exceeded, I would try to raise it, just a try, but
> >> question one needs to 
> >> be resolved first.
> >>     
> >
> > I have instrumented this in DTrace (perhaps
> incorrectly or incompletely, so I am open to
> suggestions on changes). My DTrace script and
> complete dumps are available earlier in this thread,
> however the relevant (as far as I can see) portions
> are as follows:
> >
> > time:225428757327280    umount2-execname:zoneadmd
> mountpoint:/smb1_pool0/smb1_zone/root/dev
>        flag:0  PID:4873        ParentPID:1     
> 225428762957138    umount2-execname:zoneadmd
> return arg0:0   PID:4873        ParentPID:1
>      
> e:225428766242637    exec-execname:zoneadmd
>  target:/bin/sh  PID:7316        ParentPID:4873  
> time:225428833735669    exec-execname:ksh93
> target:/usr/sbin/umount PID:7329
>         ParentPID:7316  
> 25428837002545    exec-execname:umount
> target:/usr/lib/fs/zfs/umount   PID:7329
>         ParentPID:7316  
> 25428847206624    umount2-execname:zfs
> mountpoint:/smb1_pool0/smb1_zone/root   flag:0
>   PID:7329        ParentPID:7316  
> time:225432170675815
> umount2-execname:hastorageplus_po
> mountpoint:/smb1_pool0/smb1_zone/root
>    flag:1024       PID:7450        ParentPID:1179  
> ime:225435468361047    umount2-execname:zfs    return
> arg0:0   PID:7329        ParentPID:7316  
> > time:225435468446546
> umount2-execname:hastorageplus_po       return
>  arg0:-1  PID:7450        ParentPID:1179  
> time:225435475257693    umount2-execname:zoneadmd
> mountpoint:/var/run/zones/smb1.zoneadmd_door
>     flag:0  PID:4873        ParentPID:1     
> me:225435483475900    umount2-execname:zoneadmd
> return arg0:0   PID:4873        ParentPID:1
>      
> f I read this correctly, zoneadmd has finished
> stopping the named zone, and is unmounting various
> mountpoints within the zone's tree
> (/smb1_pool0/smb1_zone/root/dev at the beginning), 
> >
> > Then calling (indirectly) /usr/lib/fs/zfs/umount
> which starts to umount2 /smb1_pool0/smb1_zone/root
> >
> > Before that call to umount2 that
> /usr/lib/fs/zfs/umount made returns, however,
> hastorageplus_po tries to umount2 the same mountpoint
> (well, hastorageplus_po is trying to export the pool,
> but part of that is to umount2 all mounted zfs
> mountpoints recursively first).
> >
> > Then the zfs umount2 completes with success,
> >
> > Then the hastorageplus_po umount2 fails (this makes
> sense, in a very limited scope, as the mountpoint is
> gone after the call is made and before it
> completes)... which puts the resource named
> smb1_zpool into failed state.
> >
> > What I don't understand is why smb1_zpool (the
> resource that should have called hastorageplus_po) is
> beginning the 'stop' sequence when the zfs umount2
> hasn't completed yet.
> >
> >   
> >> Detlef
> >>
> >> Tundra Slosek wrote:
> >>     
> >>>> Hi Tundra,
> >>>>
> >>>> The reasoning behind is that the root directory
> is
> >>>>         
> >> a
> >>     
> >>>> property of 
> >>>> Solaris, and placing something in her might have
> >>>>         
> >> some
> >>     
> >>>> impact. It could 
> >>>> have been, that the zoneadm halt tried to
> unmount
> >>>>         
> >> the
> >>     
> >>>> root fs without 
> >>>> success, because the gds is sitting on it.
> >>>>     
> >>>>         
> >>> As a recap - sometimes stop (no matter the
> source)
> >>>       
> >> works correctly, sometimes it doesn't. When it
> >> doesn't, it is because zoneadm issues a zfs umount
> >> against the root directory and that is still
> >> lingering when the underlying zpool's
> hastorageplus
> >> tries to export the zpool. What I have noticed is
> >> that when the timing is right (i.e. zfs umount
> >> completes first), then the zpool export happens
> >> without the 'FORCE' flag set, but when the timing
> is
> >> wrong (and zfs umount has not yet completed), then
> >> the 'FORCE' flag is set on the zpool export (and
> it
> >> fails because the device is in use, and then
> >> immediately after, the zfs umount completes).
> >>     
> >>> I do not understand why the hastorageplus begins
> >>>       
> >> it's 'stop' before the zone is completely stopped
> -
> >> what seems to happen is that the zone stops, and
> then
> >> issues zoneadm request to unmount the zonepath;
> >> however the gds returns to the rgm with success
> >> before zoneadm is actually finished. 
> >>     
> >>>   
> >>>       
> >>>> Anyway silly question. you do have adependency
> >>>> between the sczbt 
> >>>> resource and the HAStoragePlus resource?
> >>>>     
> >>>>         
> >>> No question is silly. If I undestand the output
> of
> >>>       
> >> clrs here, then the dependency is set.
> >>     
> >>> root at mltstore1:~# clrs show -v smb1_zone | grep
> >>>       
> >> smb1
> >>     
> >>> Resource:
> >>>       
> >>                                       smb1_zone
> >>                smb1_rg
> >> pendencies:                           smb1_lhname
> >> smb1_zpool
> >>     
> >>>   Start_command:
> >>>       
> >> opt/SUNWsczone/sczbt/bin/start_sczbt -R smb1_zone
> -G
> >> smb1_rg -P /smb1_pool0/parameters
> >>     
> >>>   Stop_command:
> >>>       
> >> opt/SUNWsczone/sczbt/bin/stop_sczbt -R smb1_zone
> -G
> >> smb1_rg -P /smb1_pool0/parameters
> >>     
> >>>   Probe_command:
> >>>       
> >> opt/SUNWsczone/sczbt/bin/probe_sczbt -R smb1_zone
> -G
> >> smb1_rg -P /smb1_pool0/parameters
> >>     
> >>>   Network_resources_used:
> >>>       
> >>                       smb1_lhname
> >>     
> >>>> Tundra Slosek wrote:
> >>>>     
> >>>>         
> >>>>>> Hi Tundra,
> >>>>>>
> >>>>>> One thing which you should never do is move
> the
> >>>>>> parameter directory into 
> >>>>>> the root file system for the zone. this is
> what
> >>>>>>         
> >>>>>>             
> >>>> might
> >>>>     
> >>>>         
> >>>>>> cause the 
> >>>>>> headache, because the sczbt resource accesses
> >>>>>>             
> >> the
> >>     
> >>>>>> parameter directory 
> >>>>>> and calling zoneadm halt which tries to remove
> >>>>>>         
> >>>>>>             
> >>>>  the
> >>>>     
> >>>>         
> >>>>> mount and this might 
> >>>>>       
> >>>>>           
> >>>>>> not work.
> >>>>>>
> >>>>>> I would suggest to move the parameters
> directory
> >>>>>>         
> >>>>>>             
> >>>> to:
> >>>>     
> >>>>         
> >>>>>> /smb1_pool0/parameters
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>> I'm not sure I understand why a file open in
> >>>>>       
> >>>>>           
> >>>> /smb1_pool0/smb1_zone/parameters/ would prevent
> >>>>         
> >> zfs
> >>     
> >>>> unmounting of /smb1_pool0/smb1_zone/root,
> however
> >>>> it's easy enough to test, I don't see any harm
> in
> >>>>         
> >> the
> >>     
> >>>> suggested change and I remain open to the
> >>>>         
> >> possibility
> >>     
> >>>> that there is something fundamental I'm
> >>>> misunderstanding. 
> >>>>     
> >>>>         
> >>>>> Done, (created the directory above, copied the
> >>>>>       
> >>>>>           
> >>>> existing contents of parameters and changed the
> >>>>         
> >> clrs
> >>     
> >>>> Start_command, Stop_command and Probe_command to
> >>>> point at /smb1_pool0/parameters instead of
> >>>> /smb1_pool0/smb1_zone/parameters) however the
> >>>>         
> >> exact
> >>     
> >>>> same behavior exists - i.e. overlap between zfs
> >>>> unmount of /smb1_pool0/smb1_zone/root and
> >>>> hastorageplus attempting to export the
> smb1_pool0
> >>>> zpool. DTrace log available as per prior
> efforts,
> >>>>         
> >> if
> >>     
> >>>> anyone thinks it will be helpful, however it
> >>>>         
> >> doesn't
> >>     
> >>>> seem different to me.
> >>>>     
> >>>>         
> >>>>>   
> >>>>>       
> >>>>>           
> >>>> -- 
> >>>>
> >>>>
> >>>>         
> >>
> ******************************************************
> >>     
> >>>> ***********************
> >>>>  Detlef Ulherr
> >>>> Staff Engineer                                   Tel: (++49
> >>>>         
> >> 6103)
> >>     
> >>>> 752-248
> >>>>  Availability Engineering                        Fax: (++49 6103)
> >>>>         
> >> 752-167
> >>     
> >>>> Sun Microsystems GmbH             
> >>>> Amperestr. 6
> >>>>
>                               mailto:detlef.ulherr at sun.com
>               http://www.sun.de/
> >>
> >>>>         
> >>
> ******************************************************
> >>     
> >>>> ******
> >>>>
> >>>> Sitz der Gesellschaft:
> >>>> Sun Microsystems GmbH, Sonnenallee 1, D-85551
> >>>> Kirchheim-Heimstetten
> >>>> Amtsgericht M?nchen: HRB 161028
> >>>> Gesch?ftsf?hrer: Thomas Schr?der, Wolfgang
> Engels,
> >>>> Wolf Frenkel
> >>>> Vorsitzender des Aufsichtsrates: Martin H?ring
> >>>>
> >>>>
> >>>>         
> >>
> ******************************************************
> >>     
> >>>> ***********************
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> ha-clusters-discuss mailing list
> >>>> ha-clusters-discuss at opensolaris.org
> >>>>
> >>>>         
> >>
> http://mail.opensolaris.org/mailman/listinfo/ha-cluste
> >>     
> >>>> rs-discuss
> >>>>     
> >>>>         
> >> -- 
> >>
> >>
> ******************************************************
> >> ***********************
> >>  Detlef Ulherr
> >> Staff Engineer                                     Tel: (++49
> 6103)
> >> 752-248
> >>  Availability Engineering                  Fax: (++49 6103)
> 752-167
> >> Sun Microsystems GmbH             
> >> Amperestr. 6
> >>                                    mailto:detlef.ulherr at sun.com
> >>                            http://www.sun.de/
> >>
> ******************************************************
> >> ******
> >>
> >> Sitz der Gesellschaft:
> >> Sun Microsystems GmbH, Sonnenallee 1, D-85551
> >> Kirchheim-Heimstetten
> >> Amtsgericht M?nchen: HRB 161028
> >> Gesch?ftsf?hrer: Thomas Schr?der, Wolfgang Engels,
> >> Wolf Frenkel
> >> Vorsitzender des Aufsichtsrates: Martin H?ring
> >>
> >>
> ******************************************************
> >> ***********************
> >>
> >>
> >> _______________________________________________
> >> ha-clusters-discuss mailing list
> >> ha-clusters-discuss at opensolaris.org
> >>
> http://mail.opensolaris.org/mailman/listinfo/ha-cluste
> >> rs-discuss
> >>     
> 
> -- 
> 
> ******************************************************
> ***********************
>  Detlef Ulherr
> Staff Engineer                                Tel: (++49 6103)
> 752-248
>  Availability Engineering                     Fax: (++49 6103) 752-167
> Sun Microsystems GmbH             
> Amperestr. 6
>                               mailto:detlef.ulherr at sun.com
>                               http://www.sun.de/
> ******************************************************
> ******
> 
> Sitz der Gesellschaft:
> Sun Microsystems GmbH, Sonnenallee 1, D-85551
> Kirchheim-Heimstetten
> Amtsgericht M?nchen: HRB 161028
> Gesch?ftsf?hrer: Thomas Schr?der, Wolfgang Engels,
> Wolf Frenkel
> Vorsitzender des Aufsichtsrates: Martin H?ring
> 
> ******************************************************
> ***********************
> 
> 
> _______________________________________________
> ha-clusters-discuss mailing list
> ha-clusters-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/ha-cluste
> rs-discuss
-- 
This message posted from opensolaris.org

[ha-clusters-discuss] HAStoragePlus resource with a zone on top, unable to migrate

Reply via email to