Hi Tundra, Based on the timestamp i have sorted and problem seems to be with zfs executable which is doing umount.
When a pool is being exported, the ZFS checks the file systems of the pool which are mounted(by checking the mnttab) and tries to unmount the file system. FAIL CASE: execname:zfs mountpoint:/smb1_pool0/smb1_zone/root flag:0 PID:892 execname:hastorageplus_po mountpoint:/smb1_pool0/smb1_zone/root flag:1024 PID:905 execname:hastorageplus_po return arg0:-1 PID:905 execname:zfs return arg0:0 PID:892 execname:zoneadmd mountpoint:/var/run/zones/smb1.zoneadmd_door flag:0 PID:86 execname:zoneadmd return arg0:0 PID:86 In the failed case, we see the zfs started unmouting the /smb1_pool0/smb1_zone/root file system, and even before it complete the HASP tried to unmount the same file system and thus failed with EBUSY. SUCCESS CASE: execname:zfs mountpoint:/smb1_pool0/smb1_zone/root flag:0 PID:1731 execname:zfs return arg0:0 PID:1731 execname:zoneadmd mountpoint:/var/run/zones/smb1.zoneadmd_door flag:0 PID:1140 execname:zoneadmd return arg0:0 PID:1140 execname:hastorageplus_po mountpoint:/smb1_pool0/smb1_zone flag:1024 PID:1745 execname:hastorageplus_po return arg0:0 PID:1745 execname:hastorageplus_po mountpoint:/smb1_pool0 flag:1024 PID:1745 execname:hastorageplus_po return arg0:0 PID:1745 In the success case, the zfs has unmounted the /smb1_pool0/smb1_zone/root, and later the HASP has successfully unmounted the remaining file systems of the pool. My guess is that, the sczbt might be using the zfs command, to unmount the legacy file systems(that it is controlling) and it returns without completing the unmount command of the file systems. Once it returned HASP tries to unmount the same file system and having issues. Thanks -Venku On 12/10/09 01:30, Tundra Slosek wrote: > My first stab at using dtrace to figure something out. > > First of all, the dstrace script file I created > -- start of umount2.d -- > #!/usr/sbin/dtrace -s > syscall::umount2:entry > { > printf("time:%d\t", timestamp); > printf("execname:%s\t",execname); > printf("mountpoint:%s\t", copyinstr(arg0, 240)); > printf("flag:%d\t", arg1); > printf("PID:%d\t", pid); > printf("\n"); > } > syscall::umount2:return > { > printf("time:%d\t", timestamp); > printf("execname:%s\t",execname); > printf("return arg0:%d\t", arg0); > printf("PID:%d\t", pid); > printf("\n"); > } > -- end of umount2.d -- > > I ran this as follows (on the node where the resource group is currently > running): > > dtrace -q -s umount2.d > > and then initiated a switch on another node: 'clrg switch -n mltstore0 > smb1_rg' > > For a failed attempt, I cleaned up (by doing zpool export, clrs clear, clrg > switch back to the node it was on originally), and then ran it again and got > a successful migration. > > I included the timestamp because I wasn't certain that if dtrace probes > output in the same order their trigger actions occur, but it appears this may > not have been needed. > > Looking at the logs that follow, it appears to me that 'zfs' is sometimes > trying to umount2 the zone's root before zoneadmd is done calling umount2. I > presume that zoneadmd is responding to the stop_sczbt which is part of the > 'smb1_zone' resource. I further presume that 'hastorageplus_po' is a > truncated form of the name of the real executable > 'hastorageplus_postnet_stop', which is responsible for stopping the resource > 'smb1_zpool'. What I don't understand is who is calling 'zfs' at all and how > the timing for this is determined (the timing/sequencing for this definitely > seems wrong). I understand hastorageplus_postnet_stop needs to umount2 all > the mountpoints that are under smb1_pool0. > > So, first the dtrace output for the failed migration: > > -- start of FAIL.log -- > time:81317653165914 execname:syslogd > mountpoint:/var/run/syslog_door flag:0 PID:401 > time:81317656727485 execname:syslogd return arg0:0 PID:401 > time:81318217888051 execname:umount mountpoint:/home flag:0 > PID:818 > time:81318221571531 execname:umount return arg0:0 PID:818 > time:81318217998677 execname:umount mountpoint:/net flag:0 PID:819 > time:81318221558069 execname:umount return arg0:0 PID:819 > time:81318773405941 execname:kcfd > mountpoint:/etc/svc/volatile/kcfd_door flag:0 PID:232 > time:81318776733762 execname:kcfd return arg0:0 PID:232 > time:81323492465856 execname:umount mountpoint:/lib/libc.so.1 > flag:0 PID:852 > time:81323495696843 execname:umount return arg0:0 PID:852 > time:81323670232146 execname:umount mountpoint:/usr flag:0 PID:856 > time:81323670246733 execname:umount return arg0:-1 PID:856 > time:81323670254188 execname:umount mountpoint:/usr flag:0 PID:856 > time:81323670260945 execname:umount return arg0:-1 PID:856 > time:81324927558568 execname:zoneadmd > mountpoint:/smb1_pool0/smb1_zone/root/var/run/name_service_door flag:0 > PID:86 > time:81324930826999 execname:zoneadmd return arg0:0 PID:86 > time:81324930840281 execname:zoneadmd > mountpoint:/smb1_pool0/smb1_zone/root/var/run/ldap_cache_door flag:0 > PID:86 > time:81324933609233 execname:zoneadmd return arg0:0 PID:86 > time:81324933619861 execname:zoneadmd > mountpoint:/smb1_pool0/smb1_zone/root/var/run flag:0 PID:86 > time:81324936260424 execname:zoneadmd return arg0:0 PID:86 > time:81324936271119 execname:zoneadmd > mountpoint:/smb1_pool0/smb1_zone/root/tmp flag:0 PID:86 > time:81324938939368 execname:zoneadmd return arg0:0 PID:86 > time:81324938950239 execname:zoneadmd > mountpoint:/smb1_pool0/smb1_zone/root/dev/fd flag:0 PID:86 > time:81324941496873 execname:zoneadmd return arg0:0 PID:86 > time:81324941507586 execname:zoneadmd > mountpoint:/smb1_pool0/smb1_zone/root/etc/svc/volatile/repository_door > flag:0 PID:86 > time:81324943991871 execname:zoneadmd return arg0:0 PID:86 > time:81324944001875 execname:zoneadmd > mountpoint:/smb1_pool0/smb1_zone/root/etc/svc/volatile flag:0 PID:86 > time:81324946581706 execname:zoneadmd return arg0:0 PID:86 > time:81324946592490 execname:zoneadmd > mountpoint:/smb1_pool0/smb1_zone/root/system/object flag:0 PID:86 > time:81324949089847 execname:zoneadmd return arg0:0 PID:86 > time:81324949100002 execname:zoneadmd > mountpoint:/smb1_pool0/smb1_zone/root/etc/mnttab flag:0 PID:86 > time:81325038747430 execname:zfs mountpoint:/smb1_pool0/smb1_zone/root > flag:0 PID:892 > time:81324951856084 execname:zoneadmd return arg0:0 PID:86 > time:81324951910502 execname:zoneadmd > mountpoint:/smb1_pool0/smb1_zone/root/system/contract flag:0 PID:86 > time:81324954592338 execname:zoneadmd return arg0:0 PID:86 > time:81324954602845 execname:zoneadmd > mountpoint:/smb1_pool0/smb1_zone/root/proc flag:0 PID:86 > time:81324957107873 execname:zoneadmd return arg0:0 PID:86 > time:81324957118283 execname:zoneadmd > mountpoint:/smb1_pool0/smb1_zone/root/dev flag:0 PID:86 > time:81324959628415 execname:zoneadmd return arg0:0 PID:86 > time:81325478095434 execname:hastorageplus_po > mountpoint:/smb1_pool0/smb1_zone/root flag:1024 PID:905 > time:81325478107989 execname:hastorageplus_po return arg0:-1 > PID:905 > time:81327204073037 execname:zfs return arg0:0 PID:892 > time:81327208669581 execname:zoneadmd > mountpoint:/var/run/zones/smb1.zoneadmd_door flag:0 PID:86 > time:81327212327392 execname:zoneadmd return arg0:0 PID:86 > -- end of FAIL.log -- > > And then the dtrace log for when it succeeds: > > -- start of SUCCESS.log -- > time:81435262388640 execname:syslogd > mountpoint:/var/run/syslog_door flag:0 PID:1502 > time:81435266048810 execname:syslogd return arg0:0 PID:1502 > > time:81435914751569 execname:umount mountpoint:/home flag:0 > PID:1657 > time:81435918363189 execname:umount return arg0:0 PID:1657 > time:81435914936865 execname:umount mountpoint:/net flag:0 PID:1658 > > time:81435919573910 execname:umount return arg0:0 PID:1658 > time:81436587686476 execname:kcfd > mountpoint:/etc/svc/volatile/kcfd_door flag:0 PID:1287 > time:81436591221746 execname:kcfd return arg0:0 PID:1287 > time:81441301660986 execname:umount mountpoint:/lib/libc.so.1 > flag:0 PID:1697 > time:81441304881617 execname:umount return arg0:0 PID:1697 > time:81441479738380 execname:umount mountpoint:/usr flag:0 PID:1699 > > time:81441479752738 execname:umount return arg0:-1 PID:1699 > time:81441479760046 execname:umount mountpoint:/usr flag:0 PID:1699 > > time:81441479766628 execname:umount return arg0:-1 PID:1699 > time:81442665546961 execname:zoneadmd > mountpoint:/smb1_pool0/smb1_zone/root/var/run/name_service_door flag:0 > PID:1140 > time:81442668789598 execname:zoneadmd return arg0:0 PID:1140 > > time:81442668803334 execname:zoneadmd > mountpoint:/smb1_pool0/smb1_zone/root/var/run/ldap_cache_door flag:0 > PID:1140 > time:81442671605309 execname:zoneadmd return arg0:0 PID:1140 > > time:81442671616829 execname:zoneadmd > mountpoint:/smb1_pool0/smb1_zone/root/var/run flag:0 PID:1140 > time:81442674270114 execname:zoneadmd return arg0:0 PID:1140 > > time:81442674280955 execname:zoneadmd > mountpoint:/smb1_pool0/smb1_zone/root/tmp flag:0 PID:1140 > time:81442676873354 execname:zoneadmd return arg0:0 PID:1140 > > time:81442676884067 execname:zoneadmd > mountpoint:/smb1_pool0/smb1_zone/root/dev/fd flag:0 PID:1140 > time:81442679377023 execname:zoneadmd return arg0:0 PID:1140 > > time:81442679387837 execname:zoneadmd > mountpoint:/smb1_pool0/smb1_zone/root/etc/svc/volatile/repository_door > flag:0 PID:1140 > time:81442681906124 execname:zoneadmd return arg0:0 PID:1140 > > time:81442681916691 execname:zoneadmd > mountpoint:/smb1_pool0/smb1_zone/root/etc/svc/volatile flag:0 PID:1140 > > time:81442684532650 execname:zoneadmd return arg0:0 PID:1140 > > time:81442684543401 execname:zoneadmd > mountpoint:/smb1_pool0/smb1_zone/root/system/object flag:0 PID:1140 > > time:81442687080347 execname:zoneadmd return arg0:0 PID:1140 > > time:81442687090314 execname:zoneadmd > mountpoint:/smb1_pool0/smb1_zone/root/etc/mnttab flag:0 PID:1140 > > time:81442689544665 execname:zoneadmd return arg0:0 PID:1140 > > time:81442689554602 execname:zoneadmd > mountpoint:/smb1_pool0/smb1_zone/root/system/contract flag:0 PID:1140 > > time:81442692027816 execname:zoneadmd return arg0:0 PID:1140 > > time:81442692037715 execname:zoneadmd > mountpoint:/smb1_pool0/smb1_zone/root/proc flag:0 PID:1140 > time:81442694508472 execname:zoneadmd return arg0:0 PID:1140 > > time:81442694517974 execname:zoneadmd > mountpoint:/smb1_pool0/smb1_zone/root/dev flag:0 PID:1140 > time:81442697032882 execname:zoneadmd return arg0:0 PID:1140 > > time:81442769405120 execname:zfs mountpoint:/smb1_pool0/smb1_zone/root > flag:0 PID:1731 > time:81445650144773 execname:zoneadmd > mountpoint:/var/run/zones/smb1.zoneadmd_door flag:0 PID:1140 > time:81445653806133 execname:zoneadmd return arg0:0 PID:1140 > > time:81445645327783 execname:zfs return arg0:0 PID:1731 > time:81446875296840 execname:hastorageplus_po > mountpoint:/smb1_pool0/smb1_zone flag:1024 PID:1745 > time:81448383192577 execname:hastorageplus_po return arg0:0 > PID:1745 > time:81448383206061 execname:hastorageplus_po > mountpoint:/smb1_pool0 flag:1024 PID:1745 > time:81448388802294 execname:hastorageplus_po return arg0:0 > PID:1745 > -- end of SUCCESS.log --