[ha-clusters-discuss] HAStoragePlus resource with a zone on top, unable to migrate

Tundra Slosek Mon, 14 Dec 2009 09:46:05 PST

> I do not understand this as well the only possibilty
> is that the 
> stop_timeout is excceded, but then the status of the
> sczbt resource must 
> be stop_failed.


It is not the sczbt resource (in this case, smb1_zone) which has is in 
stop_failed. It is the underlying HAStoragePlus resources (in this case, 
smb1_zpool) which is failed.

> 
> So we have to solve two questions:
> what is blocking the zfs umount?
> is the stop_timoutn exceeded? This must be reflected
> in 
> /var/adm/messages of the node something like:
> "Function: stop_sczbt - 
> Manual intervention needed for non-global zone". If
> the stop_timout is 
> exceeded, I would try to raise it, just a try, but
> question one needs to 
> be resolved first.

I have instrumented this in DTrace (perhaps incorrectly or incompletely, so I 
am open to suggestions on changes). My DTrace script and complete dumps are 
available earlier in this thread, however the relevant (as far as I can see) 
portions are as follows:

time:225428757327280    umount2-execname:zoneadmd       
mountpoint:/smb1_pool0/smb1_zone/root/dev       flag:0  PID:4873        
ParentPID:1     
time:225428762957138    umount2-execname:zoneadmd       return arg0:0   
PID:4873        ParentPID:1     
time:225428766242637    exec-execname:zoneadmd  target:/bin/sh  PID:7316        
ParentPID:4873  
time:225428833735669    exec-execname:ksh93     target:/usr/sbin/umount 
PID:7329        ParentPID:7316  
time:225428837002545    exec-execname:umount    target:/usr/lib/fs/zfs/umount   
PID:7329        ParentPID:7316  
time:225428847206624    umount2-execname:zfs    
mountpoint:/smb1_pool0/smb1_zone/root   flag:0  PID:7329        ParentPID:7316  
time:225432170675815    umount2-execname:hastorageplus_po       
mountpoint:/smb1_pool0/smb1_zone/root   flag:1024       PID:7450        
ParentPID:1179  
time:225435468361047    umount2-execname:zfs    return arg0:0   PID:7329        
ParentPID:7316  
time:225435468446546    umount2-execname:hastorageplus_po       return arg0:-1  
PID:7450        ParentPID:1179  
time:225435475257693    umount2-execname:zoneadmd       
mountpoint:/var/run/zones/smb1.zoneadmd_door    flag:0  PID:4873        
ParentPID:1     
time:225435483475900    umount2-execname:zoneadmd       return arg0:0   
PID:4873        ParentPID:1     

If I read this correctly, zoneadmd has finished stopping the named zone, and is 
unmounting various mountpoints within the zone's tree 
(/smb1_pool0/smb1_zone/root/dev at the beginning), 

Then calling (indirectly) /usr/lib/fs/zfs/umount which starts to umount2 
/smb1_pool0/smb1_zone/root

Before that call to umount2 that /usr/lib/fs/zfs/umount made returns, however, 
hastorageplus_po tries to umount2 the same mountpoint (well, hastorageplus_po 
is trying to export the pool, but part of that is to umount2 all mounted zfs 
mountpoints recursively first).

Then the zfs umount2 completes with success,

Then the hastorageplus_po umount2 fails (this makes sense, in a very limited 
scope, as the mountpoint is gone after the call is made and before it 
completes)... which puts the resource named smb1_zpool into failed state.

What I don't understand is why smb1_zpool (the resource that should have called 
hastorageplus_po) is beginning the 'stop' sequence when the zfs umount2 hasn't 
completed yet.

> 
> Detlef
> 
> Tundra Slosek wrote:
> >> Hi Tundra,
> >>
> >> The reasoning behind is that the root directory is
> a
> >> property of 
> >> Solaris, and placing something in her might have
> some
> >> impact. It could 
> >> have been, that the zoneadm halt tried to unmount
> the
> >> root fs without 
> >> success, because the gds is sitting on it.
> >>     
> >
> > As a recap - sometimes stop (no matter the source)
> works correctly, sometimes it doesn't. When it
> doesn't, it is because zoneadm issues a zfs umount
> against the root directory and that is still
> lingering when the underlying zpool's hastorageplus
> tries to export the zpool. What I have noticed is
> that when the timing is right (i.e. zfs umount
> completes first), then the zpool export happens
> without the 'FORCE' flag set, but when the timing is
> wrong (and zfs umount has not yet completed), then
> the 'FORCE' flag is set on the zpool export (and it
> fails because the device is in use, and then
> immediately after, the zfs umount completes).
> >
> > I do not understand why the hastorageplus begins
> it's 'stop' before the zone is completely stopped -
> what seems to happen is that the zone stops, and then
> issues zoneadm request to unmount the zonepath;
> however the gds returns to the rgm with success
> before zoneadm is actually finished. 
> >
> >   
> >> Anyway silly question. you do have adependency
> >> between the sczbt 
> >> resource and the HAStoragePlus resource?
> >>     
> >
> > No question is silly. If I undestand the output of
> clrs here, then the dependency is set.
> >
> > root at mltstore1:~# clrs show -v smb1_zone | grep
> smb1
> > Resource:
>                                       smb1_zone
>                smb1_rg
> pendencies:                           smb1_lhname
> smb1_zpool
> >   Start_command:
> 
> opt/SUNWsczone/sczbt/bin/start_sczbt -R smb1_zone -G
> smb1_rg -P /smb1_pool0/parameters
> >   Stop_command:
> 
> opt/SUNWsczone/sczbt/bin/stop_sczbt -R smb1_zone -G
> smb1_rg -P /smb1_pool0/parameters
> >   Probe_command:
> 
> opt/SUNWsczone/sczbt/bin/probe_sczbt -R smb1_zone -G
> smb1_rg -P /smb1_pool0/parameters
> >   Network_resources_used:
>                       smb1_lhname
> >> Tundra Slosek wrote:
> >>     
> >>>> Hi Tundra,
> >>>>
> >>>> One thing which you should never do is move the
> >>>> parameter directory into 
> >>>> the root file system for the zone. this is what
> >>>>         
> >> might
> >>     
> >>>> cause the 
> >>>> headache, because the sczbt resource accesses
> the
> >>>> parameter directory 
> >>>> and calling zoneadm halt which tries to remove
> >>>>         
> >>  the
> >>     
> >>> mount and this might 
> >>>       
> >>>> not work.
> >>>>
> >>>> I would suggest to move the parameters directory
> >>>>         
> >> to:
> >>     
> >>>> /smb1_pool0/parameters
> >>>>     
> >>>>         
> >>> I'm not sure I understand why a file open in
> >>>       
> >> /smb1_pool0/smb1_zone/parameters/ would prevent
> zfs
> >> unmounting of /smb1_pool0/smb1_zone/root, however
> >> it's easy enough to test, I don't see any harm in
> the
> >> suggested change and I remain open to the
> possibility
> >> that there is something fundamental I'm
> >> misunderstanding. 
> >>     
> >>> Done, (created the directory above, copied the
> >>>       
> >> existing contents of parameters and changed the
> clrs
> >> Start_command, Stop_command and Probe_command to
> >> point at /smb1_pool0/parameters instead of
> >> /smb1_pool0/smb1_zone/parameters) however the
> exact
> >> same behavior exists - i.e. overlap between zfs
> >> unmount of /smb1_pool0/smb1_zone/root and
> >> hastorageplus attempting to export the smb1_pool0
> >> zpool. DTrace log available as per prior efforts,
> if
> >> anyone thinks it will be helpful, however it
> doesn't
> >> seem different to me.
> >>     
> >>>   
> >>>       
> >> -- 
> >>
> >>
> ******************************************************
> >> ***********************
> >>  Detlef Ulherr
> >> Staff Engineer                                     Tel: (++49
> 6103)
> >> 752-248
> >>  Availability Engineering                  Fax: (++49 6103)
> 752-167
> >> Sun Microsystems GmbH             
> >> Amperestr. 6
> >>                                    mailto:detlef.ulherr at sun.com
> >>                            http://www.sun.de/
> >>
> ******************************************************
> >> ******
> >>
> >> Sitz der Gesellschaft:
> >> Sun Microsystems GmbH, Sonnenallee 1, D-85551
> >> Kirchheim-Heimstetten
> >> Amtsgericht M?nchen: HRB 161028
> >> Gesch?ftsf?hrer: Thomas Schr?der, Wolfgang Engels,
> >> Wolf Frenkel
> >> Vorsitzender des Aufsichtsrates: Martin H?ring
> >>
> >>
> ******************************************************
> >> ***********************
> >>
> >>
> >> _______________________________________________
> >> ha-clusters-discuss mailing list
> >> ha-clusters-discuss at opensolaris.org
> >>
> http://mail.opensolaris.org/mailman/listinfo/ha-cluste
> >> rs-discuss
> >>     
> 
> -- 
> 
> ******************************************************
> ***********************
>  Detlef Ulherr
> Staff Engineer                                Tel: (++49 6103)
> 752-248
>  Availability Engineering                     Fax: (++49 6103) 752-167
> Sun Microsystems GmbH             
> Amperestr. 6
>                               mailto:detlef.ulherr at sun.com
>                               http://www.sun.de/
> ******************************************************
> ******
> 
> Sitz der Gesellschaft:
> Sun Microsystems GmbH, Sonnenallee 1, D-85551
> Kirchheim-Heimstetten
> Amtsgericht M?nchen: HRB 161028
> Gesch?ftsf?hrer: Thomas Schr?der, Wolfgang Engels,
> Wolf Frenkel
> Vorsitzender des Aufsichtsrates: Martin H?ring
> 
> ******************************************************
> ***********************
> 
> 
> _______________________________________________
> ha-clusters-discuss mailing list
> ha-clusters-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/ha-cluste
> rs-discuss
-- 
This message posted from opensolaris.org

[ha-clusters-discuss] HAStoragePlus resource with a zone on top, unable to migrate

Reply via email to