Hi Armin, Thanks for reproducing the problem.
I am not able to figure out any obvious reason about the multiple node importing the same ZFS pool. Here is one thing i want to mention that HASP will intentionally import the pool using alternate path (zpool import -R / <poolname>) so that the pool configuration will not be cached by solaris and thus prevents the node (which imported the pools) to do auto import while it gets booted. This is especially required to avoid multiple node import problem during failovers. Thanks & Regards -Venku On 10/29/08 19:04, Armin Ollig wrote: > Hi Venku, > > thanks for your input. Meanwhile i cloned the systems with lucreate and > updated it to the ce 09/08 release. I removed and recreated the cluster. The > failure that both nodes mount the zfs concurrently happens after i trigger a > fail over of the HASP resource (by rebooting the active-node). This leads to > a kernel panic, thereafter i import manually another pool and reboot. Finally > both nodes mount the HASP-zfs. Here are the commands and logs involved: > > Remove old HASP resource: > > siegfried# clresource disable vb1-storage > siegfried# clresource delete vb1-storage > siegfried# clresourcegroup delete vb1 > siegfried# zpool destroy vb1 > siegfried# zpool create UNUSED___ /dev/did/dsk/d5s0 # overwrite the zfs labels > siegfried# rm /etc/zfs/zpool.cache > voelsung# rm /etc/zfs/zpool.cache > voelsung# scshutdown -y -g0 # end episode 1 > > > Re-create the resource: > > siegfried# zpool create -f vb1 /dev/did/dsk/d5s0 > siegfried# zfs create vb1/vb1 > siegfried# zpool status vb1 > pool: vb1 > state: ONLINE > scrub: none requested > config: > NAME STATE READ WRITE CKSUM > vb1 ONLINE 0 0 0 > /dev/did/dsk/d5s0 ONLINE 0 0 0 > errors: No known data errors > siegfried# zfs create vb1/vb1 > siegfried# clresourcegroup create vb1 > siegfried# clresourcegroup manage vb1 > siegfried# clrg online vb1 > siegfried# clresource create -t SUNW.HAStoragePlus \ >> -g vb1 \ >> -p Zpools=vb1 \ >> -p AffinityOn=True vb1-storage > siegfried# clresource status vb1-storage > === Cluster Resources === > Resource Name Node Name State Status Message > ------------- --------- ----- -------------- > vb1-storage voelsung Online Online > siegfried Offline Offline > > voelsung# mount| grep vb1 > /vb1 on vb1 read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90002 > on Wed Oct 29 13:30:57 2008 > /vb1/vb1 on vb1/vb1 > read/write/setuid/devices/nonbmand/exec/xattr/atime/dev=2d90003 on Wed Oct 29 > 13:30:57 2008 > siegfried# mount| grep vb1 > siegfried# > #....no problem until here... > > > Things go wrong (voelsung is rebooted, siegfried panics): > voelsung# reboot > Oct 29 13:38:47 siegfried genunix: NOTICE: clcomm: Path siegfried:e1000g2 - > voelsung:e1000g2 being drained > Oct 29 13:38:47 siegfried genunix: NOTICE: clcomm: Path siegfried:e1000g1 - > voelsung:e1000g1 being drained > Oct 29 13:38:47 siegfried ip: TCP_IOC_ABORT_CONN: aborted 0 connection > Oct 29 13:38:49 siegfried genunix: NOTICE: CMM: Node voelsung (nodeid = 1) is > down. > Oct 29 13:38:49 siegfried genunix: NOTICE: CMM: Cluster members: siegfried. > Oct 29 13:38:49 siegfried genuniNx: NOTICE: CMM:onode reconfigurttion #3 > completifying cluster that this node is panicking > > panic[cpu3]/thread=ffffff000f631c80: Reservation Conflict > Disk: /scsi_vhci/disk at g600d02300000000000888275cd1cc500 > > ffffff000f631a00 sd:sd_panic_for_res_conflict+4f () > ffffff000f631a40 sd:sd_pkt_status_reservation_conflict+a8 () > ffffff000f631a90 sd:sdintr+44e () > ffffff000f631b30 scsi_vhci:vhci_intr+6ac () > ffffff000f631b50 fcp:fcp_post_callback+1e () > ffffff000f631b90 fcp:fcp_cmd_callback+4b () > ffffff000f631bd0 emlxs:emlxs_iodone+b1 () > ffffff000f631c20 emlxs:emlxs_iodone_server+15d () > ffffff000f631c60 emlxs:emlxs_thread+15e () > ffffff000f631c70 unix:thread_start+8 () > > syncing file systems... 6 4 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 done > (not all i/o completed) > dumping to /dev/dsk/c4t600D0230000000000088824BC4228803d0s1, offset > 1719074816, content: kernel > 100% done: 178044 pages dumped, compression ratio 5.30, dump succeeded > rebooting... > > I import a pool "dummypool" manually on siegfried (this zpool is not under > HASP control and is manually import only on node siegfried) > siegfried# zpool import dummypool > siegfried# zpool status dummypool > > pool: dummypool > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > dummypool ONLINE 0 0 0 > c4t600D02300000000000888275CD1CC500d0s0 ONLINE 0 0 0 > > errors: No known data errors > siegfried# reboot > Kernel errors while booting siegfried: > Hostname: siegfried > WARNING: /scsi_vhci/disk at g600d02300000000000888275cd1cc500 (sd15): > reservation conflict > WARNING: /scsi_vhci/disk at g600d0230000000000088824bc4228807 (sd13): > reservation conflict > > Here (finally) the error: Both nodes mount the zfs concurrently: > siegfried# zpool status > pool: dummypool > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > dummypool ONLINE 0 0 0 > c4t600D02300000000000888275CD1CC500d0s0 ONLINE 0 0 0 > > errors: No known data errors > > pool: vb1 > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > vb1 ONLINE 0 0 0 > c4t600D0230000000000088824BC4228807d0s0 ONLINE 0 0 0 > > errors: No known data errors > > voelsung# zpool status > pool: vb1 > state: ONLINE > scrub: none requested > config: > > NAME STATE READ WRITE CKSUM > vb1 ONLINE 0 0 0 > c4t600D0230000000000088824BC4228807d0s0 ONLINE 0 0 0 > > errors: No known data errors > voelsung# clresource status vb1-storage > > === Cluster Resources === > > Resource Name Node Name State Status Message > ------------- --------- ----- -------------- > vb1-storage voelsung Online Online > siegfried Offline Offline > > > Best wishes, > Armin