Hi, Hope this is of some use!
bacall root 40> mdb unix.8 vmcore.8 Loading modules: [ unix krtld genunix ip s1394 usba logindmux ptm cpc sppp ipc r andom nca ] > $C 000002a101e7c941 __1cLschema_list4nTInterfaceDescriptor__Lfind_locked6MkpI_pn0A__+4(30001cdcf50, ffffffb8, 20, 6, 0, 30000047288) 000002a101e7c9f1 __1cLschema_list4nTInterfaceDescriptor__Efind6MkpI_pn0A__+0x14( 30001cdcf50, ffffffb8, 1400000, 2a10007dd20, 0, 30001cdcf50) 000002a101e7caa1 __1cIOxSchemaKdescriptor6MkpIpnHhandler__pnTInterfaceDescriptor__+8(30001cdcf50 , ffffffb8, 0, 783c6000, 783c6, 30001cdcf50) 000002a101e7cbe1 __1cTInterfaceDescriptorOsizeof_methods6MI_I_+0x74(78396d58, cf , cf0, 0, ffff, 7856f528) 000002a101e7cc91 __1cOremote_handlerQget_index_method6MrnHservice_pnTInterfaceDescriptor_rI_pFpv2 _v_+0x24(30003cbf580, 2a101e7d7f0, 78396d58, 2a101e7d62c, 78396000, 30006e58000 ) 000002a101e7cd51 __1cOremote_handlerUhandle_incoming_call6MrnHservice__v_+0x34( 30003cbf580, 2a101e7d7f0, 78336df4, 30003cbf578, 784ce920, 78356ddc) 000002a101e7ce31 __1cGrxdoorVhandle_request_common6FrnHID_node_rnHservice_pnSrxdoor_invo_header_C _v_+0x3fc(30006e580d0, 2a101e7d7f0, 2a101e7d840, 8, 30003e58f28, 2a101e7d848) 000002a101e7cf41 __1cGrxdoorNhandle_twoway6FpnJrecstream__v_+0xf0(30006e58000, 78000, 30006e58030, 0, 30006e580d0, 2a101e7d7f0) 000002a101e7d091 __1cTthreadpool_worker_tVdeferred_task_handler6M_v_+0x114( 78145118, 30001ba0c78, 30006e58000, 30000a1e8d8, 1, 30000a1e8d8) 000002a101e7d141 __1cKthreadpoolOthread_handler6FpnTthreadpool_worker_t__v_+0x1c (30001ba0c78, 1, 30001bdd7e8, 1, 783c8000, 783c8) 000002a101e7d1f1 cllwpwrapper+0x10c(2a101e7db80, 78372a84, 0, 0, 783dc000, 783dc ) 000002a101e7d2d1 thread_start+4(2a101e7db80, 18, 0, 0, 0, 0) > This is from the /var/adm/messages of the "bad" machine bacall. The good machines are dalle and dietrich: bacall cl_runtime: [ID 965873 kern.notice] NOTICE: CMM: Node die trich (nodeid = 1) with votecount = 1 added. Dec 13 13:47:39 bacall cl_runtime: [ID 965873 kern.notice] NOTICE: CMM: Node dal le (nodeid = 2) with votecount = 1 added. Dec 13 13:47:39 bacall cl_runtime: [ID 965873 kern.notice] NOTICE: CMM: Node bac all (nodeid = 3) with votecount = 1 added. Dec 13 13:47:40 bacall cl_runtime: [ID 884114 kern.notice] NOTICE: clcomm: Adapt er ce2 constructed Dec 13 13:47:40 bacall cl_runtime: [ID 884114 kern.notice] NOTICE: clcomm: Adapt er ce1 constructed Dec 13 13:47:40 bacall cl_runtime: [ID 843983 kern.notice] NOTICE: CMM: Node bac all: attempting to join cluster. Dec 13 13:47:40 bacall cl_runtime: [ID 537175 kern.notice] NOTICE: CMM: Node dal le (nodeid: 2, incarnation #: 1197553098) has become reachable. Dec 13 13:47:40 bacall cl_runtime: [ID 537175 kern.notice] NOTICE: CMM: Node die trich (nodeid: 1, incarnation #: 1197553257) has become reachable. Dec 13 13:47:40 bacall cl_runtime: [ID 387288 kern.notice] NOTICE: clcomm: Path bacall:ce1 - dalle:bge1 online Dec 13 13:47:40 bacall cl_runtime: [ID 387288 kern.notice] NOTICE: clcomm: Path bacall:ce1 - dietrich:bge1 online Dec 13 13:47:40 bacall cl_runtime: [ID 387288 kern.notice] NOTICE: clcomm: Path bacall:ce2 - dietrich:bge2 online Dec 13 13:47:40 bacall cl_runtime: [ID 387288 kern.notice] NOTICE: clcomm: Path bacall:ce2 - dalle:bge2 online Dec 13 13:47:40 bacall cl_runtime: [ID 525628 kern.notice] NOTICE: CMM: Cluster has reached quorum. Dec 13 13:47:40 bacall cl_runtime: [ID 377347 kern.notice] NOTICE: CMM: Node die trich (nodeid = 1) is up; new incarnation number = 1197553257. Dec 13 13:47:40 bacall cl_runtime: [ID 377347 kern.notice] NOTICE: CMM: Node dal le (nodeid = 2) is up; new incarnation number = 1197553098. Dec 13 13:47:40 bacall cl_runtime: [ID 377347 kern.notice] NOTICE: CMM: Node bac all (nodeid = 3) is up; new incarnation number = 1197553659. Dec 13 13:47:40 bacall cl_runtime: [ID 108990 kern.notice] NOTICE: CMM: Cluster members: dietrich dalle bacall. Dec 13 13:47:40 bacall cl_dlpitrans: [ID 624622 kern.notice] Notifying cluster t hat this node is panicking Dec 13 13:47:40 bacall unix: [ID 836849 kern.notice] Dec 13 13:47:40 bacall ^Mpanic[cpu0]/thread=2a10007dd20: Dec 13 13:47:40 bacall unix: [ID 340138 kern.notice] BAD TRAP: type=31 rp=2a1000 7c350 addr=8 mmu_fsr=0 occurred in module "cl_orb" due to a NULL pointer derefer ence Dec 13 13:47:40 bacall unix: [ID 100000 kern.notice] Dec 13 13:47:40 bacall unix: [ID 839527 kern.notice] sched: Dec 13 13:47:40 bacall unix: [ID 520581 kern.notice] trap type = 0x31 Dec 13 13:47:40 bacall unix: [ID 381800 kern.notice] addr=0x8 And on one of the "good" machines Dec 13 13:49:15 dietrich cl_runtime: [ID 537175 kern.notice] NOTICE: CMM: Node b acall (nodeid: 3, incarnation #: 1197553659) has become reachable. Dec 13 13:49:15 dietrich cl_runtime: [ID 387288 kern.notice] NOTICE: clcomm: Pat h dietrich:bge1 - bacall:ce1 online Dec 13 13:49:15 dietrich cl_runtime: [ID 387288 kern.notice] NOTICE: clcomm: Pat h dietrich:bge2 - bacall:ce2 online Dec 13 13:49:15 dietrich cl_runtime: [ID 377347 kern.notice] NOTICE: CMM: Node b acall (nodeid = 3) is up; new incarnation number = 1197553659. Dec 13 13:49:15 dietrich cl_runtime: [ID 108990 kern.notice] NOTICE: CMM: Cluste r members: dietrich dalle bacall. Dec 13 13:49:15 dietrich Cluster.Framework: [ID 801593 daemon.notice] stdout: re leasing reservations for scsi-2 disks shared with bacall Dec 13 13:49:16 dietrich cl_runtime: [ID 489438 kern.notice] NOTICE: clcomm: Pat h dietrich:bge1 - bacall:ce1 being drained Dec 13 13:49:16 dietrich ip: [ID 678092 kern.notice] TCP_IOC_ABORT_CONN: local = 000.000.000.000:0, remote = 172.016.004.003:0, start = -2, end = 6 And another h D_ID=10400, PWWN=210000e08b0e56e5 reappeared in fabric Dec 13 13:49:15 dalle cl_runtime: [ID 537175 kern.notice] NOTICE: CMM: Node baca ll (nodeid: 3, incarnation #: 1197553659) has become reachable. Dec 13 13:49:15 dalle cl_runtime: [ID 387288 kern.notice] NOTICE: clcomm: Path d alle:bge1 - bacall:ce1 online Dec 13 13:49:15 dalle cl_runtime: [ID 377347 kern.notice] NOTICE: CMM: Node baca ll (nodeid = 3) is up; new incarnation number = 1197553659. Dec 13 13:49:15 dalle cl_runtime: [ID 387288 kern.notice] NOTICE: clcomm: Path d alle:bge2 - bacall:ce2 online Dec 13 13:49:15 dalle cl_runtime: [ID 108990 kern.notice] NOTICE: CMM: Cluster m embers: dietrich dalle bacall. Dec 13 13:49:16 dalle Cluster.Framework: [ID 801593 daemon.notice] stdout: relea sing reservations for scsi-2 disks shared with bacall Dec 13 13:49:16 dalle cl_runtime: [ID 489438 kern.notice] NOTICE: clcomm: Path d alle:bge2 - bacall:ce2 being drained Dec 13 13:49:16 dalle cl_runtime: [ID 489438 kern.notice] NOTICE: clcomm: Path d alle:bge1 - bacall:ce1 being drained It is DotHill-R/Evo2730-2R-J100 array, connected to the three nodes via fibre channel and qlogic switches. All machines can see the disk in format. Thanks in advance for any suggestions. I have to go home soon but will be back on the case first thing in the morning! Cheers, David -----Original Message----- From: Thorsten.Frueauf at Sun.COM [mailto:[email protected]] Sent: 13 December 2007 16:27 To: ALLEN, David Cc: ha-clusters-discuss at opensolaris.org Subject: Re: [ha-clusters-discuss] Establising quorums? Hi David, could you provide more specific error messages? The panic string would help, if there is a crash dump written, the stack trace would be interesting, like # mdb unix.0 vmcore.0 > $C Also any related messages seen within /var/adm/messages on all nodes during the time the 3rd node tried to join. Can you also indicate how your interconnect is setup? Greets Thorsten ALLEN, David wrote: > Hi, > > I am trying to set up a 3.2 cluster containing 3 sun boxes accessing > one fibre disk. All three machines have see the disk correctly. Two of > the boxes are working great. However when the third joins the cluster > it promptly panics and crashes which suggests a quorum problem? > "Cluster show" shows quorum_vote = 1 for the two working machines and > quorum_vote = 0 for the faulty machine. Not much elase I can tell you > other than all machines are patched to the hilt. Everything else looks OK. > > Any suggestions anyone? I am beginning to panic as well! > > Thanks in advance > > David > > *CONFIDENTIALITY NOTICE* The information contained in this e-mail is > intended only for the confidential use of the above named recipient. > If you are not the intended recipient or person responsible for > delivering it to the intended recipient, you have received this > communication in error and must not distribute or copy it. Please > accept the sender's apologies, notify the sender immediately by return > e-mail and delete this communication. Thank you. > > > ---------------------------------------------------------------------- > -- -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten Amtsgericht M?nchen: HRB 161028 Gesch?ftsf?hrer: Thomas Schr?der, Wolfgang Engels, Dr. Roland B?mer Vorsitzender des Aufsichtsrates: Martin H?ring ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ CONFIDENTIALITY NOTICE The information contained in this e-mail is intended only for the confidential use of the above named recipient. If you are not the intended recipient or person responsible for delivering it to the intended recipient, you have received this communication in error and must not distribute or copy it. Please accept the sender's apologies, notify the sender immediately by return e-mail and delete this communication. Thank you. -------------- next part -------------- A non-text attachment was scrubbed... Name: cluster Type: application/octet-stream Size: 8775 bytes Desc: cluster URL: <http://mail.opensolaris.org/pipermail/ha-clusters-discuss/attachments/20071213/a0b2082e/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: status Type: application/octet-stream Size: 2614 bytes Desc: status URL: <http://mail.opensolaris.org/pipermail/ha-clusters-discuss/attachments/20071213/a0b2082e/attachment-0001.obj>
