I suggest using force option for export. zpool export -f <pool> If that is failing, that it should be Solaris bug as the "force" option is supposed to export even though there current users on the pool. [ According to man page of zpool(1M)]
-Venku On 12/03/09 17:53, Tundra Slosek wrote: > To manually force the export - 'zpool export common_pool0' or something > else? > > One item that I'm curious about - is it possible that zfs-auto-snapshot > is conflicting with this? I presumed that it's smart enough to be able > to handle that (export being atomic as well as snapshot being atomic, > they shouldn't step on each other). And should zfs-auto-snapshot be > running in the specific zone or in global zone? > > The but that Amit mentioned - would that trigger with TWO zpools in the > resource? > > Thanks to all for their ideas, suggestions and knowledge. > > On Thu, Dec 3, 2009 at 6:42 AM, Venkateswarlu Tella > <Venkateswarlu.Tella at sun.com <mailto:Venkateswarlu.Tella at sun.com>> > wrote: > > Hi, > HAStoragePlus uses force export and force unmount on the pool and > its file systems respectively. So the umount has to succeed even if > someone is using file system. Not sure if something broken in ZFS > umount semantics. > > I would be good thing to test manually the force export as Hartmut > suggested. > > Thanks > -Venku > > > On 12/03/09 14:11, Hartmut Streppel wrote: > > Hi, > hard to diagnose. Your dependencies are correct. The messages > indicate that the zone is down before the HAStoragePlus resource > tries to export the zpool. > The only possibility left in my mind is that there is some other > process, running in the global zone, using the zpool to be exported. > > In this situation, are you able to export the zpool manually? If > not, it is not a cluster problem. You could try to find out > which processes are using a file or directory on that zpool. > > Regards > Hartmut > > > On 12/02/09 23:24, Tundra Slosek wrote: > > Is there a concise way to dump the pertinent details of a group? > > If I understand correctly, this shows that the resource > 'common_zone' (the gds resource created by sczbt_register) > depends on 'common_lhname' (the logicalhostname resource) > and 'common_zpool' (the HAStoragePlus resource). I am > certainly open to being enlightened... > root at mltproc1:~# /usr/cluster/bin/clresource show -y > Resource_dependencies common_zone > > === Resources === Resource: > common_zone > Resource_dependencies: > common_lhname common_zpool > > > Also, a more complete snippet of the log file going back > further in time - does the first log entry at 11:43:18 show > that the zone actually stopped, or that the cluster > incorrectly thinks the zone is stopped when it isn't?: > > Dec 2 11:43:17 mltproc1 > SC[SUNWsczone.stop_sczbt]:common_shares:common_zone: [ID > 567783 daemon.notice] stop_command rc<0> > - Changing to init state 0 - please wait > Dec 2 11:43:17 mltproc1 > SC[SUNWsczone.stop_sczbt]:common_shares:common_zone: [ID > 567783 daemon.notice] stop_command rc<0> > - Shutdown started. Wed Dec 2 11:40:30 EST 2009 > Dec 2 11:43:18 mltproc1 Cluster.RGM.global.rgmd: [ID 515159 > daemon.notice] method <gds_svc_stop> completed successfully f > or resource <common_zone>, resource group <common_shares>, > node <mltproc1>, time used: 56% of timeout <300 seconds> > Dec 2 11:43:18 mltproc1 Cluster.RGM.global.rgmd: [ID 224900 > daemon.notice] launching method <hafoip_stop> for resource <c > ommon_lhname>, resource group <common_shares>, node > <mltproc1>, timeout <300> seconds > Dec 2 11:43:18 mltproc1 ip: [ID 678092 kern.notice] > TCP_IOC_ABORT_CONN: local = 192.168.011.005:0 > <http://192.168.011.005:0>, remote = 000.000.000.0 > 00:0, start = -2, end = 6 > Dec 2 11:43:18 mltproc1 ip: [ID 302654 kern.notice] > TCP_IOC_ABORT_CONN: aborted 0 connection Dec 2 11:43:18 > mltproc1 Cluster.RGM.global.rgmd: [ID 515159 daemon.notice] > method <hafoip_stop> completed successfully fo > r resource <common_lhname>, resource group <common_shares>, > node <mltproc1>, time used: 0% of timeout <300 seconds> > Dec 2 11:43:19 mltproc1 Cluster.RGM.global.rgmd: [ID 224900 > daemon.notice] launching method <hastorageplus_postnet_stop> > for resource <personal_pool>, resource group > <common_shares>, node <mltproc1>, timeout <1800> seconds > Dec 2 11:43:19 mltproc1 Cluster.RGM.global.rgmd: [ID 224900 > daemon.notice] launching method <hastorageplus_postnet_stop> > for resource <common_zpool>, resource group <common_shares>, > node <mltproc1>, timeout <1800> seconds > Dec 2 11:43:22 mltproc1 Cluster.RGM.global.rgmd: [ID 515159 > daemon.notice] method <hastorageplus_postnet_stop> completed > successfully for resource <personal_pool>, resource group > <common_shares>, node <mltproc1>, time used: 0% of timeout <1800 > seconds> > Dec 2 11:43:22 mltproc1 > > SC[,SUNW.HAStoragePlus:8,common_shares,common_zpool,hastorageplus_postnet_stop]: > [ID 471757 daemon.error] cannot unmount > '/common_pool0/common_zone/root' : Device busy > Dec 2 11:43:22 mltproc1 > > SC[,SUNW.HAStoragePlus:8,common_shares,common_zpool,hastorageplus_postnet_stop]: > [ID 952001 daemon.error] Failed to export :common_pool0 Dec > 2 11:43:22 mltproc1 Cluster.RGM.global.rgmd: [ID 938318 > daemon.error] Method <hastorageplus_postnet_stop> failed on > resource <common_zpool> in resource group <common_shares> > [exit code <1>, time used: 0% of timeout <1800 seconds>] > > > >