[ha-clusters-discuss] HAStoragePlus resource with a zone on top, unable to migrate

Venkateswarlu Tella Thu, 03 Dec 2009 17:12:44 +0530

Hi,
HAStoragePlus uses force export and force unmount on the pool and its file 
systems respectively. So the umount has to succeed even if someone is using 
file system. Not sure if something broken in ZFS umount semantics.


I would be good thing to test manually the force export as Hartmut suggested.

Thanks
-Venku

On 12/03/09 14:11, Hartmut Streppel wrote:
> Hi,
> hard to diagnose. Your dependencies are correct. The messages indicate 
> that the zone is down before the HAStoragePlus resource tries to export 
> the zpool.
> The only possibility left in my mind is that there is some other 
> process, running in the global zone, using the zpool to be exported.
> 
> In this situation, are you able to export the zpool manually? If not, it 
> is not a cluster problem. You could try to find out which processes are 
> using a file or directory on that zpool.
> 
> Regards
> Hartmut
> 
> 
> On 12/02/09 23:24, Tundra Slosek wrote:
>> Is there a concise way to dump the pertinent details of a group?
>>
>> If I understand correctly, this shows that the resource 'common_zone' 
>> (the gds resource created by sczbt_register) depends on 
>> 'common_lhname' (the logicalhostname resource) and 'common_zpool' (the 
>> HAStoragePlus resource). I am certainly open to being enlightened...
>> root at mltproc1:~# /usr/cluster/bin/clresource show -y 
>> Resource_dependencies common_zone
>>
>> === Resources ===                             
>> Resource:                                       common_zone
>>   Resource_dependencies:                           common_lhname 
>> common_zpool
>>
>>
>> Also, a more complete snippet of the log file going back further in 
>> time - does the first log entry at 11:43:18 show that the zone 
>> actually stopped, or that the cluster incorrectly thinks the zone is 
>> stopped when it isn't?:
>>
>> Dec  2 11:43:17 mltproc1 
>> SC[SUNWsczone.stop_sczbt]:common_shares:common_zone: [ID 567783 
>> daemon.notice] stop_command rc<0>
>>  - Changing to init state 0 - please wait
>> Dec  2 11:43:17 mltproc1 
>> SC[SUNWsczone.stop_sczbt]:common_shares:common_zone: [ID 567783 
>> daemon.notice] stop_command rc<0>
>>  - Shutdown started.    Wed Dec  2 11:40:30 EST 2009
>> Dec  2 11:43:18 mltproc1 Cluster.RGM.global.rgmd: [ID 515159 
>> daemon.notice] method <gds_svc_stop> completed successfully f
>> or resource <common_zone>, resource group <common_shares>, node 
>> <mltproc1>, time used: 56% of timeout <300 seconds>
>> Dec  2 11:43:18 mltproc1 Cluster.RGM.global.rgmd: [ID 224900 
>> daemon.notice] launching method <hafoip_stop> for resource <c
>> ommon_lhname>, resource group <common_shares>, node <mltproc1>, 
>> timeout <300> seconds
>> Dec  2 11:43:18 mltproc1 ip: [ID 678092 kern.notice] 
>> TCP_IOC_ABORT_CONN: local = 192.168.011.005:0, remote = 000.000.000.0
>> 00:0, start = -2, end = 6
>> Dec  2 11:43:18 mltproc1 ip: [ID 302654 kern.notice] 
>> TCP_IOC_ABORT_CONN: aborted 0 connection Dec  2 11:43:18 mltproc1 
>> Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] method 
>> <hafoip_stop> completed successfully fo
>> r resource <common_lhname>, resource group <common_shares>, node 
>> <mltproc1>, time used: 0% of timeout <300 seconds>
>> Dec  2 11:43:19 mltproc1 Cluster.RGM.global.rgmd: [ID 224900 
>> daemon.notice] launching method <hastorageplus_postnet_stop> for 
>> resource <personal_pool>, resource group <common_shares>, node 
>> <mltproc1>, timeout <1800> seconds
>> Dec  2 11:43:19 mltproc1 Cluster.RGM.global.rgmd: [ID 224900 
>> daemon.notice] launching method <hastorageplus_postnet_stop> for 
>> resource <common_zpool>, resource group <common_shares>, node 
>> <mltproc1>, timeout <1800> seconds
>> Dec  2 11:43:22 mltproc1 Cluster.RGM.global.rgmd: [ID 515159 
>> daemon.notice] method <hastorageplus_postnet_stop> completed 
>> successfully for resource <personal_pool>, resource group 
>> <common_shares>, node <mltproc1>, time used: 0% of timeout <1800
>>  seconds>
>> Dec  2 11:43:22 mltproc1 
>> SC[,SUNW.HAStoragePlus:8,common_shares,common_zpool,hastorageplus_postnet_stop]:
>>  
>> [ID 471757 daemon.error] cannot unmount 
>> '/common_pool0/common_zone/root' : Device busy
>> Dec  2 11:43:22 mltproc1 
>> SC[,SUNW.HAStoragePlus:8,common_shares,common_zpool,hastorageplus_postnet_stop]:
>>  
>> [ID 952001 daemon.error] Failed to export :common_pool0 Dec  2 
>> 11:43:22 mltproc1 Cluster.RGM.global.rgmd: [ID 938318 daemon.error] 
>> Method <hastorageplus_postnet_stop> failed on resource <common_zpool> 
>> in resource group <common_shares> [exit code <1>, time used: 0% of 
>> timeout <1800 seconds>]
>>   
>

[ha-clusters-discuss] HAStoragePlus resource with a zone on top, unable to migrate

Reply via email to