To manually force the export - 'zpool export common_pool0' or something else?
One item that I'm curious about - is it possible that zfs-auto-snapshot is conflicting with this? I presumed that it's smart enough to be able to handle that (export being atomic as well as snapshot being atomic, they shouldn't step on each other). And should zfs-auto-snapshot be running in the specific zone or in global zone? The but that Amit mentioned - would that trigger with TWO zpools in the resource? Thanks to all for their ideas, suggestions and knowledge. On Thu, Dec 3, 2009 at 6:42 AM, Venkateswarlu Tella < Venkateswarlu.Tella at sun.com> wrote: > Hi, > HAStoragePlus uses force export and force unmount on the pool and its file > systems respectively. So the umount has to succeed even if someone is using > file system. Not sure if something broken in ZFS umount semantics. > > I would be good thing to test manually the force export as Hartmut > suggested. > > Thanks > -Venku > > > On 12/03/09 14:11, Hartmut Streppel wrote: > >> Hi, >> hard to diagnose. Your dependencies are correct. The messages indicate >> that the zone is down before the HAStoragePlus resource tries to export the >> zpool. >> The only possibility left in my mind is that there is some other process, >> running in the global zone, using the zpool to be exported. >> >> In this situation, are you able to export the zpool manually? If not, it >> is not a cluster problem. You could try to find out which processes are >> using a file or directory on that zpool. >> >> Regards >> Hartmut >> >> >> On 12/02/09 23:24, Tundra Slosek wrote: >> >>> Is there a concise way to dump the pertinent details of a group? >>> >>> If I understand correctly, this shows that the resource 'common_zone' >>> (the gds resource created by sczbt_register) depends on 'common_lhname' (the >>> logicalhostname resource) and 'common_zpool' (the HAStoragePlus resource). I >>> am certainly open to being enlightened... >>> root at mltproc1:~# /usr/cluster/bin/clresource show -y >>> Resource_dependencies common_zone >>> >>> === Resources === Resource: >>> common_zone >>> Resource_dependencies: common_lhname >>> common_zpool >>> >>> >>> Also, a more complete snippet of the log file going back further in time >>> - does the first log entry at 11:43:18 show that the zone actually stopped, >>> or that the cluster incorrectly thinks the zone is stopped when it isn't?: >>> >>> Dec 2 11:43:17 mltproc1 >>> SC[SUNWsczone.stop_sczbt]:common_shares:common_zone: [ID 567783 >>> daemon.notice] stop_command rc<0> >>> - Changing to init state 0 - please wait >>> Dec 2 11:43:17 mltproc1 >>> SC[SUNWsczone.stop_sczbt]:common_shares:common_zone: [ID 567783 >>> daemon.notice] stop_command rc<0> >>> - Shutdown started. Wed Dec 2 11:40:30 EST 2009 >>> Dec 2 11:43:18 mltproc1 Cluster.RGM.global.rgmd: [ID 515159 >>> daemon.notice] method <gds_svc_stop> completed successfully f >>> or resource <common_zone>, resource group <common_shares>, node >>> <mltproc1>, time used: 56% of timeout <300 seconds> >>> Dec 2 11:43:18 mltproc1 Cluster.RGM.global.rgmd: [ID 224900 >>> daemon.notice] launching method <hafoip_stop> for resource <c >>> ommon_lhname>, resource group <common_shares>, node <mltproc1>, timeout >>> <300> seconds >>> Dec 2 11:43:18 mltproc1 ip: [ID 678092 kern.notice] TCP_IOC_ABORT_CONN: >>> local = 192.168.011.005:0, remote = 000.000.000.0 >>> 00:0, start = -2, end = 6 >>> Dec 2 11:43:18 mltproc1 ip: [ID 302654 kern.notice] TCP_IOC_ABORT_CONN: >>> aborted 0 connection Dec 2 11:43:18 mltproc1 Cluster.RGM.global.rgmd: [ID >>> 515159 daemon.notice] method <hafoip_stop> completed successfully fo >>> r resource <common_lhname>, resource group <common_shares>, node >>> <mltproc1>, time used: 0% of timeout <300 seconds> >>> Dec 2 11:43:19 mltproc1 Cluster.RGM.global.rgmd: [ID 224900 >>> daemon.notice] launching method <hastorageplus_postnet_stop> for resource >>> <personal_pool>, resource group <common_shares>, node <mltproc1>, timeout >>> <1800> seconds >>> Dec 2 11:43:19 mltproc1 Cluster.RGM.global.rgmd: [ID 224900 >>> daemon.notice] launching method <hastorageplus_postnet_stop> for resource >>> <common_zpool>, resource group <common_shares>, node <mltproc1>, timeout >>> <1800> seconds >>> Dec 2 11:43:22 mltproc1 Cluster.RGM.global.rgmd: [ID 515159 >>> daemon.notice] method <hastorageplus_postnet_stop> completed successfully >>> for resource <personal_pool>, resource group <common_shares>, node >>> <mltproc1>, time used: 0% of timeout <1800 >>> seconds> >>> Dec 2 11:43:22 mltproc1 >>> SC[,SUNW.HAStoragePlus:8,common_shares,common_zpool,hastorageplus_postnet_stop]: >>> [ID 471757 daemon.error] cannot unmount '/common_pool0/common_zone/root' : >>> Device busy >>> Dec 2 11:43:22 mltproc1 >>> SC[,SUNW.HAStoragePlus:8,common_shares,common_zpool,hastorageplus_postnet_stop]: >>> [ID 952001 daemon.error] Failed to export :common_pool0 Dec 2 11:43:22 >>> mltproc1 Cluster.RGM.global.rgmd: [ID 938318 daemon.error] Method >>> <hastorageplus_postnet_stop> failed on resource <common_zpool> in resource >>> group <common_shares> [exit code <1>, time used: 0% of timeout <1800 >>> seconds>] >>> >>> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.opensolaris.org/pipermail/ha-clusters-discuss/attachments/20091203/5a008b51/attachment-0001.html>