[ha-clusters-discuss] HAStoragePlus resource with a zone on top, unable to migrate

Tundra Slosek Mon, 07 Dec 2009 08:32:38 PST

> Hi Tundra,
> I see two problems in your configuration
> 1. Keeping the dependencies. And you have answered it
> already.
>     common_zone -> personal pool -> common pool
> This ensure proper start and stop and does the
> mounting
>     and unmounting of the file system in ZFS pool.
> > <zone name="common"
> zonepath="/common_pool0/common_zone" autoboot="false"
> 
> brand="ipkg" limitpriv="default,sys_smb">
>  >   <dataset name="personal_pool0/personal"/>
> > </zone>
> 
> In general, it is not recommended to add the a ZFS
> pool dataset to zone using 
> zonecfg(1M), that is being controlled by
> HAStoragePlus. The reason is when a 
> pool is imported on another physical cluster node as
> part failover/switchover, 
> the booting of zone on the current node will have a
> problem as the dataset is 
> not available.
> 
> I suggest to remove the dataset from zonecfg(1M) and
> also tunn off the zoned 
> property of that file system.
> 
> Thanks
> -Venku


Venku, thanks for your help so far, however I am still doing something wrong. 
This is what I've done:

First, on all nodes in order to remove the personal_pool0/personal dataset from 
the zone named 'common', I issued:

zonecfg -z common remove dataset

Which leaves the following in /etc/zones/common.xml:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE zone PUBLIC "-//Sun Microsystems Inc//DTD Zones//EN" 
"file:///usr/share/lib/xml/dtd/zonecfg.dtd.1">
<!--
    DO NOT EDIT THIS FILE.  Use zonecfg(1M) instead.
-->
<zone name="common" zonepath="/common_pool0/common_zone" autoboot="false" 
brand="ipkg" limitpriv="default,sys_smb"/>

And then I attempted to set the dependencies with:

root at mltproc1:~# zfs get zoned,mountpoint personal_pool0/personal
NAME                     PROPERTY    VALUE                     SOURCE
personal_pool0/personal  zoned       on                        local
personal_pool0/personal  mountpoint  /personal_pool0/personal  default
root at mltproc1:~# zfs set zoned=off personal_pool0/personal
root at mltproc1:~# zfs set 
mountpoint=/common_pool0/common_zone/root/personal_pool0/personal 
personal_pool0/personal
root at mltproc1:~# zfs get zoned,mountpoint personal_pool0/personal
NAME                     PROPERTY    VALUE                                      
             SOURCE
personal_pool0/personal  zoned       off                                        
             local
personal_pool0/personal  mountpoint  
/common_pool0/common_zone/root/personal_pool0/personal  local
root at mltstore0:~# clrs set -p Resource_dependencies+=common_zpool 
personal_pool
root at mltstore0:~# clrs set -p Resource_dependencies+=personal_pool 
common_zone

I didn't get any errors in this process.

Now when I attempt to switch or start the 'common_shares' resource group, it 
just keeps migrating from node to node, never getting out of 'Pending online', 
and in the logging host for the cluster I see the following, which looks to me 
like the zone just doesn't start (timeout at 11:08:40 - looks like 
192.168.11.21 and 192.168.11.22 are 2 sec off in timesync), but I don't see any 
details of why:

Dec  7 11:03:37 [192.168.11.21.214.62] 
SC[,SUNW.gds:6,common_shares,common_zone,gds_svc_start]: [ID 661560 
daemon.info] All the SUNW.HAStoragePlus resources that this resource depends on 
are online on the local node. Proceeding with the checks for the existence and 
permissions of the start/stop/probe commands.
Dec  7 11:03:37 [192.168.11.21.214.62] 
SC[,SUNW.gds:6,common_shares,common_zone,gds_svc_start]: [ID 268646 
daemon.info] Extension property <network_aware> has a value of <1>
Dec  7 11:03:37 [192.168.11.21.214.62] 
SC[,SUNW.LogicalHostname:3,common_shares,common_lhname,hafoip_monitor_start]: 
[ID 211198 daemon.info] Completed successfully.
Dec  7 11:03:37 [192.168.11.21.214.62] Cluster.RGM.global.rgmd: [ID 515159 
daemon.notice] method <hafoip_monitor_start> completed successfully for 
resource <common_lhname>, resource group <common_shares>, node <mltstore1>, 
time used: 0% of timeout <300 seconds>
Dec  7 11:03:35 [192.168.11.22.236.224] Cluster.RGM.global.rgmd: [ID 443746 
daemon.notice] resource common_lhname state on node mltstore1 change to R_ONLINE
Dec  7 11:03:37 [192.168.11.21.214.62] 
SC[,SUNW.gds:6,common_shares,common_zone,gds_svc_start]: [ID 887138 
daemon.info] Extension property <Child_mon_level> has a value of <-1>
Dec  7 11:03:37 [192.168.11.21.214.62] 
SC[,SUNW.gds:6,common_shares,common_zone,gds_svc_start]: [ID 833212 
daemon.info] Attempting to start the data service under process monitor 
facility.
Dec  7 11:03:37 [192.168.11.21.214.62] 
SC[,SUNW.gds:6,common_shares,common_zone,gds_svc_start]: [ID 569559 
daemon.info] Start of /opt/SUNWsczone/sczbt/bin/start_sczbt -R common_zone -G 
common_shares -P /common_pool0/common_zone/parameters  completed successfully.
Dec  7 11:03:37 [192.168.11.21.214.62] 
SC[,SUNW.gds:6,common_shares,common_zone,gds_svc_start]: [ID 268646 
daemon.info] Extension property <network_aware> has a value of <1>
Dec  7 11:03:38 [192.168.11.21.214.62] genunix: [ID 408114 kern.info] 
/pseudo/zconsnex at 1/zcons at 1 (zcons1) online
Dec  7 11:08:40 [192.168.11.21.214.62] Cluster.RGM.global.rgmd: [ID 764140 
daemon.error] Method <gds_svc_start> on resource <common_zone>, resource group 
<common_shares>, node <mltstore1>: Timeout.
Dec  7 11:08:38 [192.168.11.22.236.224] Cluster.RGM.global.rgmd: [ID 443746 
daemon.error] resource common_zone state on node mltstore1 change to 
R_START_FAILED
Dec  7 11:08:38 [192.168.11.22.236.224] Cluster.RGM.global.rgmd: [ID 784560 
daemon.notice] resource common_zone status on node mltstore1 change to 
R_FM_FAULTED
Dec  7 11:08:38 [192.168.11.22.236.224] Cluster.RGM.global.rgmd: [ID 922363 
daemon.notice] resource common_zone status msg on node mltstore1 change to <>
Dec  7 11:08:38 [192.168.11.22.236.224] Cluster.RGM.global.rgmd: [ID 529407 
daemon.error] resource group common_shares state on node mltstore1 change to 
RG_PENDING_OFF_START_FAILED
Dec  7 11:08:38 [192.168.11.22.236.224] Cluster.RGM.global.rgmd: [ID 784560 
daemon.notice] resource common_zone status on node mltstore1 change to 
R_FM_UNKNOWN
Dec  7 11:08:38 [192.168.11.22.236.224] Cluster.RGM.global.rgmd: [ID 922363 
daemon.notice] resource common_zone status msg on node mltstore1 change to 
<Stopping>
Dec  7 11:08:38 [192.168.11.22.236.224] Cluster.RGM.global.rgmd: [ID 443746 
daemon.notice] resource common_zone state on node mltstore1 change to R_STOPPING
Dec  7 11:08:40 [192.168.11.21.214.62] Cluster.RGM.global.rgmd: [ID 224900 
daemon.notice] launching method <hastorageplus_monitor_stop> for resource 
<personal_pool>, resource group <common_shares>, node <mltstore1>, timeout <90> 
seconds
Dec  7 11:08:40 [192.168.11.21.214.62] Cluster.RGM.global.rgmd: [ID 224900 
daemon.notice] launching method <hafoip_monitor_stop> for resource 
<common_lhname>, resource group <common_shares>, node <mltstore1>, timeout 
<300> seconds
Dec  7 11:08:40 [192.168.11.21.214.62] Cluster.RGM.global.rgmd: [ID 224900 
daemon.notice] launching method <hastorageplus_monitor_stop> for resource 
<common_zpool>, resource group <common_shares>, node <mltstore1>, timeout <90> 
seconds
Dec  7 11:08:40 [192.168.11.21.214.62] Cluster.RGM.global.rgmd: [ID 224900 
daemon.notice] launching method <gds_svc_stop> for resource <common_zone>, 
resource group <common_shares>, node <mltstore1>, timeout <300> seconds
Dec  7 11:08:40 [192.168.11.21.214.62] Cluster.RGM.global.rgmd: [ID 669833 
daemon.debug] 68 fe_rpc_command: 
cmd_type(enum):<1>:cmd=</usr/cluster/lib/rgm/rt/hastorageplus/hastorageplus_monitor_stop>:tag=<common_shares.personal_pool.8>:
 Calling security_clnt_connect(..., host=<mltstore1>, sec_type {0:WEAK, 
1:STRONG, 2:DES} =<1>, ...)
Dec  7 11:08:40 [192.168.11.21.214.62] Cluster.RGM.global.rgmd: [ID 653003 
daemon.debug] 73 fe_rpc_command: 
cmd_type(enum):<1>:cmd=</usr/cluster/lib/rgm/rt/hastorageplus/hastorageplus_monitor_stop>:tag=<common_shares.common_zpool.8>:
 Calling security_clnt_connect(..., host=<mltstore1>, sec_type {0:WEAK, 
1:STRONG, 2:DES} =<1>, ...)
Dec  7 11:08:40 [192.168.11.21.214.62] Cluster.RGM.global.rgmd: [ID 846460 
daemon.debug] 65 fe_rpc_command: 
cmd_type(enum):<1>:cmd=</usr/cluster/lib/rgm/rt/hafoip/hafoip_monitor_stop>:tag=<common_shares.common_lhname.8>:
 Calling security_clnt_connect(..., host=<mltstore1>, sec_type {0:WEAK, 
1:STRONG, 2:DES} =<1>, ...)
Dec  7 11:08:40 [192.168.11.21.214.62] Cluster.RGM.global.rgmd: [ID 170767 
daemon.debug] 71 fe_rpc_command: 
cmd_type(enum):<1>:cmd=</opt/SUNWscgds/bin/gds_svc_stop>:tag=<common_shares.common_zone.1>:
 Calling security_clnt_connect(..., host=<mltstore1>, sec_type {0:WEAK, 
1:STRONG, 2:DES} =<1>, ...)
Dec  7 11:08:40 [192.168.11.21.214.62] Cluster.RGM.global.rgmd: [ID 515159 
daemon.notice] method <hastorageplus_monitor_stop> completed successfully for 
resource <personal_pool>, resource group <common_shares>, node <mltstore1>, 
time used: 0% of timeout <90 seconds>
-- 
This message posted from opensolaris.org

[ha-clusters-discuss] HAStoragePlus resource with a zone on top, unable to migrate

Reply via email to