[ha-clusters-discuss] HAStoragePlus resource with a zone on top, unable to migrate

Venkateswarlu Tella Thu, 03 Dec 2009 18:00:23 +0530

I suggest using force option for export.
zpool export -f <pool>

If that is failing, that it should be Solaris bug as the "force" option is 
supposed to export even though there current users on the pool. [ According to 
man page of zpool(1M)]


-Venku

On 12/03/09 17:53, Tundra Slosek wrote:
> To manually force the export - 'zpool export common_pool0' or something 
> else?
> 
> One item that I'm curious about - is it possible that zfs-auto-snapshot 
> is conflicting with this? I presumed that it's smart enough to be able 
> to handle that (export being atomic as well as snapshot being atomic, 
> they shouldn't step on each other). And should zfs-auto-snapshot be 
> running in the specific zone or in global zone?
> 
> The but that Amit mentioned - would that trigger with TWO zpools in the 
> resource?
> 
> Thanks to all for their ideas, suggestions and knowledge.
> 
> On Thu, Dec 3, 2009 at 6:42 AM, Venkateswarlu Tella 
> <Venkateswarlu.Tella at sun.com <mailto:Venkateswarlu.Tella at sun.com>> 
> wrote:
> 
>     Hi,
>     HAStoragePlus uses force export and force unmount on the pool and
>     its file systems respectively. So the umount has to succeed even if
>     someone is using file system. Not sure if something broken in ZFS
>     umount semantics.
> 
>     I would be good thing to test manually the force export as Hartmut
>     suggested.
> 
>     Thanks
>     -Venku
> 
> 
>     On 12/03/09 14:11, Hartmut Streppel wrote:
> 
>         Hi,
>         hard to diagnose. Your dependencies are correct. The messages
>         indicate that the zone is down before the HAStoragePlus resource
>         tries to export the zpool.
>         The only possibility left in my mind is that there is some other
>         process, running in the global zone, using the zpool to be exported.
> 
>         In this situation, are you able to export the zpool manually? If
>         not, it is not a cluster problem. You could try to find out
>         which processes are using a file or directory on that zpool.
> 
>         Regards
>         Hartmut
> 
> 
>         On 12/02/09 23:24, Tundra Slosek wrote:
> 
>             Is there a concise way to dump the pertinent details of a group?
> 
>             If I understand correctly, this shows that the resource
>             'common_zone' (the gds resource created by sczbt_register)
>             depends on 'common_lhname' (the logicalhostname resource)
>             and 'common_zpool' (the HAStoragePlus resource). I am
>             certainly open to being enlightened...
>             root at mltproc1:~# /usr/cluster/bin/clresource show -y
>             Resource_dependencies common_zone
> 
>             === Resources ===                             Resource:    
>                                               common_zone
>              Resource_dependencies:                          
>             common_lhname common_zpool
> 
> 
>             Also, a more complete snippet of the log file going back
>             further in time - does the first log entry at 11:43:18 show
>             that the zone actually stopped, or that the cluster
>             incorrectly thinks the zone is stopped when it isn't?:
> 
>             Dec  2 11:43:17 mltproc1
>             SC[SUNWsczone.stop_sczbt]:common_shares:common_zone: [ID
>             567783 daemon.notice] stop_command rc<0>
>              - Changing to init state 0 - please wait
>             Dec  2 11:43:17 mltproc1
>             SC[SUNWsczone.stop_sczbt]:common_shares:common_zone: [ID
>             567783 daemon.notice] stop_command rc<0>
>              - Shutdown started.    Wed Dec  2 11:40:30 EST 2009
>             Dec  2 11:43:18 mltproc1 Cluster.RGM.global.rgmd: [ID 515159
>             daemon.notice] method <gds_svc_stop> completed successfully f
>             or resource <common_zone>, resource group <common_shares>,
>             node <mltproc1>, time used: 56% of timeout <300 seconds>
>             Dec  2 11:43:18 mltproc1 Cluster.RGM.global.rgmd: [ID 224900
>             daemon.notice] launching method <hafoip_stop> for resource <c
>             ommon_lhname>, resource group <common_shares>, node
>             <mltproc1>, timeout <300> seconds
>             Dec  2 11:43:18 mltproc1 ip: [ID 678092 kern.notice]
>             TCP_IOC_ABORT_CONN: local = 192.168.011.005:0
>             <http://192.168.011.005:0>, remote = 000.000.000.0
>             00:0, start = -2, end = 6
>             Dec  2 11:43:18 mltproc1 ip: [ID 302654 kern.notice]
>             TCP_IOC_ABORT_CONN: aborted 0 connection Dec  2 11:43:18
>             mltproc1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice]
>             method <hafoip_stop> completed successfully fo
>             r resource <common_lhname>, resource group <common_shares>,
>             node <mltproc1>, time used: 0% of timeout <300 seconds>
>             Dec  2 11:43:19 mltproc1 Cluster.RGM.global.rgmd: [ID 224900
>             daemon.notice] launching method <hastorageplus_postnet_stop>
>             for resource <personal_pool>, resource group
>             <common_shares>, node <mltproc1>, timeout <1800> seconds
>             Dec  2 11:43:19 mltproc1 Cluster.RGM.global.rgmd: [ID 224900
>             daemon.notice] launching method <hastorageplus_postnet_stop>
>             for resource <common_zpool>, resource group <common_shares>,
>             node <mltproc1>, timeout <1800> seconds
>             Dec  2 11:43:22 mltproc1 Cluster.RGM.global.rgmd: [ID 515159
>             daemon.notice] method <hastorageplus_postnet_stop> completed
>             successfully for resource <personal_pool>, resource group
>             <common_shares>, node <mltproc1>, time used: 0% of timeout <1800
>              seconds>
>             Dec  2 11:43:22 mltproc1
>             
> SC[,SUNW.HAStoragePlus:8,common_shares,common_zpool,hastorageplus_postnet_stop]:
>             [ID 471757 daemon.error] cannot unmount
>             '/common_pool0/common_zone/root' : Device busy
>             Dec  2 11:43:22 mltproc1
>             
> SC[,SUNW.HAStoragePlus:8,common_shares,common_zpool,hastorageplus_postnet_stop]:
>             [ID 952001 daemon.error] Failed to export :common_pool0 Dec
>              2 11:43:22 mltproc1 Cluster.RGM.global.rgmd: [ID 938318
>             daemon.error] Method <hastorageplus_postnet_stop> failed on
>             resource <common_zpool> in resource group <common_shares>
>             [exit code <1>, time used: 0% of timeout <1800 seconds>]
>              
> 
> 
>

[ha-clusters-discuss] HAStoragePlus resource with a zone on top, unable to migrate

Reply via email to