Hi,

Hope this is of some use!


bacall root 40> mdb unix.8 vmcore.8
Loading modules: [ unix krtld genunix ip s1394 usba logindmux ptm cpc sppp ipc r
andom nca ]
> $C
000002a101e7c941
__1cLschema_list4nTInterfaceDescriptor__Lfind_locked6MkpI_pn0A__+4(30001cdcf50,
ffffffb8, 20, 6, 0, 30000047288)
000002a101e7c9f1 __1cLschema_list4nTInterfaceDescriptor__Efind6MkpI_pn0A__+0x14(
30001cdcf50, ffffffb8, 1400000, 2a10007dd20, 0, 30001cdcf50)
000002a101e7caa1
__1cIOxSchemaKdescriptor6MkpIpnHhandler__pnTInterfaceDescriptor__+8(30001cdcf50
, ffffffb8, 0, 783c6000, 783c6, 30001cdcf50)
000002a101e7cbe1 __1cTInterfaceDescriptorOsizeof_methods6MI_I_+0x74(78396d58, cf
, cf0, 0, ffff, 7856f528)
000002a101e7cc91
__1cOremote_handlerQget_index_method6MrnHservice_pnTInterfaceDescriptor_rI_pFpv2
_v_+0x24(30003cbf580, 2a101e7d7f0, 78396d58, 2a101e7d62c, 78396000, 30006e58000
)
000002a101e7cd51 __1cOremote_handlerUhandle_incoming_call6MrnHservice__v_+0x34(
30003cbf580, 2a101e7d7f0, 78336df4, 30003cbf578, 784ce920, 78356ddc)
000002a101e7ce31
__1cGrxdoorVhandle_request_common6FrnHID_node_rnHservice_pnSrxdoor_invo_header_C
_v_+0x3fc(30006e580d0, 2a101e7d7f0, 2a101e7d840, 8, 30003e58f28, 2a101e7d848)
000002a101e7cf41 __1cGrxdoorNhandle_twoway6FpnJrecstream__v_+0xf0(30006e58000,
78000, 30006e58030, 0, 30006e580d0, 2a101e7d7f0)
000002a101e7d091 __1cTthreadpool_worker_tVdeferred_task_handler6M_v_+0x114(
78145118, 30001ba0c78, 30006e58000, 30000a1e8d8, 1, 30000a1e8d8)
000002a101e7d141 __1cKthreadpoolOthread_handler6FpnTthreadpool_worker_t__v_+0x1c
(30001ba0c78, 1, 30001bdd7e8, 1, 783c8000, 783c8)
000002a101e7d1f1 cllwpwrapper+0x10c(2a101e7db80, 78372a84, 0, 0, 783dc000, 783dc
)
000002a101e7d2d1 thread_start+4(2a101e7db80, 18, 0, 0, 0, 0)
>




This is from the /var/adm/messages of the "bad" machine bacall. The good 
machines are dalle and dietrich:

bacall cl_runtime: [ID 965873 kern.notice] NOTICE: CMM: Node die
trich (nodeid = 1) with votecount = 1 added.
Dec 13 13:47:39 bacall cl_runtime: [ID 965873 kern.notice] NOTICE: CMM: Node dal
le (nodeid = 2) with votecount = 1 added.
Dec 13 13:47:39 bacall cl_runtime: [ID 965873 kern.notice] NOTICE: CMM: Node bac
all (nodeid = 3) with votecount = 1 added.
Dec 13 13:47:40 bacall cl_runtime: [ID 884114 kern.notice] NOTICE: clcomm: Adapt
er ce2 constructed
Dec 13 13:47:40 bacall cl_runtime: [ID 884114 kern.notice] NOTICE: clcomm: Adapt
er ce1 constructed
Dec 13 13:47:40 bacall cl_runtime: [ID 843983 kern.notice] NOTICE: CMM: Node bac
all: attempting to join cluster.
Dec 13 13:47:40 bacall cl_runtime: [ID 537175 kern.notice] NOTICE: CMM: Node dal
le (nodeid: 2, incarnation #: 1197553098) has become reachable.
Dec 13 13:47:40 bacall cl_runtime: [ID 537175 kern.notice] NOTICE: CMM: Node die
trich (nodeid: 1, incarnation #: 1197553257) has become reachable.
Dec 13 13:47:40 bacall cl_runtime: [ID 387288 kern.notice] NOTICE: clcomm: Path
bacall:ce1 - dalle:bge1 online
Dec 13 13:47:40 bacall cl_runtime: [ID 387288 kern.notice] NOTICE: clcomm: Path
bacall:ce1 - dietrich:bge1 online
Dec 13 13:47:40 bacall cl_runtime: [ID 387288 kern.notice] NOTICE: clcomm: Path
bacall:ce2 - dietrich:bge2 online
Dec 13 13:47:40 bacall cl_runtime: [ID 387288 kern.notice] NOTICE: clcomm: Path
bacall:ce2 - dalle:bge2 online
Dec 13 13:47:40 bacall cl_runtime: [ID 525628 kern.notice] NOTICE: CMM: Cluster
has reached quorum.
Dec 13 13:47:40 bacall cl_runtime: [ID 377347 kern.notice] NOTICE: CMM: Node die
trich (nodeid = 1) is up; new incarnation number = 1197553257.
Dec 13 13:47:40 bacall cl_runtime: [ID 377347 kern.notice] NOTICE: CMM: Node dal
le (nodeid = 2) is up; new incarnation number = 1197553098.
Dec 13 13:47:40 bacall cl_runtime: [ID 377347 kern.notice] NOTICE: CMM: Node bac
all (nodeid = 3) is up; new incarnation number = 1197553659.
Dec 13 13:47:40 bacall cl_runtime: [ID 108990 kern.notice] NOTICE: CMM: Cluster
members: dietrich dalle bacall.
Dec 13 13:47:40 bacall cl_dlpitrans: [ID 624622 kern.notice] Notifying cluster t
hat this node is panicking
Dec 13 13:47:40 bacall unix: [ID 836849 kern.notice]
Dec 13 13:47:40 bacall ^Mpanic[cpu0]/thread=2a10007dd20:
Dec 13 13:47:40 bacall unix: [ID 340138 kern.notice] BAD TRAP: type=31 rp=2a1000
7c350 addr=8 mmu_fsr=0 occurred in module "cl_orb" due to a NULL pointer derefer
ence
Dec 13 13:47:40 bacall unix: [ID 100000 kern.notice]
Dec 13 13:47:40 bacall unix: [ID 839527 kern.notice] sched:
Dec 13 13:47:40 bacall unix: [ID 520581 kern.notice] trap type = 0x31
Dec 13 13:47:40 bacall unix: [ID 381800 kern.notice] addr=0x8



And on one of the "good" machines

Dec 13 13:49:15 dietrich cl_runtime: [ID 537175 kern.notice] NOTICE: CMM: Node b
acall (nodeid: 3, incarnation #: 1197553659) has become reachable.
Dec 13 13:49:15 dietrich cl_runtime: [ID 387288 kern.notice] NOTICE: clcomm: Pat
h dietrich:bge1 - bacall:ce1 online
Dec 13 13:49:15 dietrich cl_runtime: [ID 387288 kern.notice] NOTICE: clcomm: Pat
h dietrich:bge2 - bacall:ce2 online
Dec 13 13:49:15 dietrich cl_runtime: [ID 377347 kern.notice] NOTICE: CMM: Node b
acall (nodeid = 3) is up; new incarnation number = 1197553659.
Dec 13 13:49:15 dietrich cl_runtime: [ID 108990 kern.notice] NOTICE: CMM: Cluste
r members: dietrich dalle bacall.
Dec 13 13:49:15 dietrich Cluster.Framework: [ID 801593 daemon.notice] stdout: re
leasing reservations for scsi-2 disks shared with bacall
Dec 13 13:49:16 dietrich cl_runtime: [ID 489438 kern.notice] NOTICE: clcomm: Pat
h dietrich:bge1 - bacall:ce1 being drained
Dec 13 13:49:16 dietrich ip: [ID 678092 kern.notice] TCP_IOC_ABORT_CONN: local =
 000.000.000.000:0, remote = 172.016.004.003:0, start = -2, end = 6


And another

h D_ID=10400, PWWN=210000e08b0e56e5 reappeared in fabric
Dec 13 13:49:15 dalle cl_runtime: [ID 537175 kern.notice] NOTICE: CMM: Node baca
ll (nodeid: 3, incarnation #: 1197553659) has become reachable.
Dec 13 13:49:15 dalle cl_runtime: [ID 387288 kern.notice] NOTICE: clcomm: Path d
alle:bge1 - bacall:ce1 online
Dec 13 13:49:15 dalle cl_runtime: [ID 377347 kern.notice] NOTICE: CMM: Node baca
ll (nodeid = 3) is up; new incarnation number = 1197553659.
Dec 13 13:49:15 dalle cl_runtime: [ID 387288 kern.notice] NOTICE: clcomm: Path d
alle:bge2 - bacall:ce2 online
Dec 13 13:49:15 dalle cl_runtime: [ID 108990 kern.notice] NOTICE: CMM: Cluster m
embers: dietrich dalle bacall.
Dec 13 13:49:16 dalle Cluster.Framework: [ID 801593 daemon.notice] stdout: relea
sing reservations for scsi-2 disks shared with bacall
Dec 13 13:49:16 dalle cl_runtime: [ID 489438 kern.notice] NOTICE: clcomm: Path d
alle:bge2 - bacall:ce2 being drained
Dec 13 13:49:16 dalle cl_runtime: [ID 489438 kern.notice] NOTICE: clcomm: Path d
alle:bge1 - bacall:ce1 being drained

It is DotHill-R/Evo2730-2R-J100 array, connected to the three nodes via fibre 
channel and qlogic switches. All machines can see the disk in format. 

Thanks in advance for any suggestions. I have to go home soon but will be back 
on the case first thing in the morning!

Cheers,

David


-----Original Message-----
From: Thorsten.Frueauf at Sun.COM [mailto:[email protected]] 
Sent: 13 December 2007 16:27
To: ALLEN, David
Cc: ha-clusters-discuss at opensolaris.org
Subject: Re: [ha-clusters-discuss] Establising quorums?

Hi David,

could you provide more specific error messages?

The panic string would help, if there is a crash dump written, the stack trace 
would be interesting, like

# mdb unix.0 vmcore.0
 > $C

Also any related messages seen within /var/adm/messages on all nodes during the 
time the 3rd node tried to join.

Can you also indicate how your interconnect is setup?

Greets
       Thorsten

ALLEN, David wrote:
> Hi,
> 
> I am trying to set up a 3.2 cluster containing 3 sun boxes accessing 
> one fibre disk. All three machines have see the disk correctly. Two of 
> the boxes are working great. However when the third  joins the cluster 
> it promptly panics and crashes which suggests a quorum problem?  
> "Cluster show" shows quorum_vote = 1 for the two working machines and 
> quorum_vote = 0 for the faulty machine.  Not much elase I can tell you 
> other than all machines are patched to the hilt. Everything else looks OK.
> 
> Any suggestions anyone? I am beginning to panic as well!
> 
> Thanks in advance
> 
> David
> 
> *CONFIDENTIALITY NOTICE* The information contained in this e-mail is 
> intended only for the confidential use of the above named recipient. 
> If you are not the intended recipient or person responsible for 
> delivering it to the intended recipient, you have received this 
> communication in error and must not distribute or copy it. Please 
> accept the sender's apologies, notify the sender immediately by return 
> e-mail and delete this communication. Thank you.
> 
> 
> ----------------------------------------------------------------------
> --

-- 
   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    Sitz der Gesellschaft:
     Sun Microsystems GmbH, Sonnenallee 1, D-85551 Kirchheim-Heimstetten
     Amtsgericht M?nchen: HRB 161028
     Gesch?ftsf?hrer: Thomas Schr?der, Wolfgang Engels, Dr. Roland B?mer
     Vorsitzender des Aufsichtsrates: Martin H?ring
   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    NOTICE:  This email message is for the sole use of the intended
    recipient(s) and may contain confidential and privileged
    information.  Any unauthorized review, use, disclosure or
    distribution is prohibited.  If you are not the intended
    recipient, please contact the sender by reply email and destroy
    all copies of the original message.
   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

CONFIDENTIALITY NOTICE  The information contained in this
e-mail is intended only for the confidential use of the above
named recipient. If you are not the intended recipient or person
responsible for delivering it to the intended recipient, you have
received this communication in error and must not distribute or 
copy it. Please accept the sender's apologies, notify the sender 
immediately by return e-mail and delete this communication.
Thank you.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: cluster
Type: application/octet-stream
Size: 8775 bytes
Desc: cluster
URL: 
<http://mail.opensolaris.org/pipermail/ha-clusters-discuss/attachments/20071213/a0b2082e/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: status
Type: application/octet-stream
Size: 2614 bytes
Desc: status
URL: 
<http://mail.opensolaris.org/pipermail/ha-clusters-discuss/attachments/20071213/a0b2082e/attachment-0001.obj>

Reply via email to