Re: [Linux-HA] Error running corosync

Andrew Beekhof Sun, 13 Nov 2011 16:45:07 -0800

On Mon, Nov 14, 2011 at 11:12 AM, Nick Khamis <[email protected]> wrote:
> Hello Andrew,
>
> Thank you so much for your response. I am using ocfs-tools 1.6.and it only
> includes pcmk and cman ocfs2 controld:
>
> ocfs2_controld.cman  ocfs2_controld.pcmk  ocfs2_hb_ctl
>
> Which stack provides the standard ocfs2_controld?


If you're running cman, use the cman one

>
> Thanks for Everything!
>
> Nick.
>
> If it's cman
>
> On Sun, Nov 13, 2011 at 6:49 PM, Andrew Beekhof <[email protected]> wrote:
>> On Sat, Nov 12, 2011 at 12:06 AM, Nick Khamis <[email protected]> wrote:
>>> Hello Andrew,
>>>
>>> I do appologize for this, and really appreciate how far I have got into
>>> this project thanks to everyone's help. Just as a quick summary:
>>>
>>> the patch that you suggested did in fact fix the following (ais.c:346):
>>>
>>> ocfs2_controld[14698]: 2011/11/02_11:32:19 ERROR: crm_abort:
>>> send_ais_text: Triggered assert at ais.c:346 : dest != crm_msg_ais
>>> Sending message 0 via cpg: FAILED (rc=22): Message error: Success (0)
>>> ocfs2_controld[14698]: 2011/11/02_11:32:19 ERROR: send_ais_text:
>>> Sending message 0 via cpg: FAILED (rc=22): Message error: Success (0)
>>> ocfs2_controld[14698]: 2011/11/02_11:32:19 ERROR: crm_abort:
>>> send_ais_text: Triggered assert at ais.c:346 : dest != crm_msg_ais
>>> Sending message 1 via cpg: FAILED (rc=22): Message error: Success (0)
>>> ocfs2_controld[14698]: 2011/11/02_11:32:19 ERROR: send_ais_text:
>>> Sending message 1 via cpg: FAILED (rc=22): Message error: Success (0)
>>> 1320247939 setup_stack@170: Cluster connection established.  Local node id: 
>>> 1
>>> 1320247939 setup_stack@174: Added Pacemaker as client 1 with fd -1
>>>
>>> The run-time error I am getting now is in (corosync.c:352):
>>>
>>> ocfs2_controld[6883]: 2011/11/03_16:34:20 info: crm_new_peer: Node 1
>>> is now known as astdrbd1
>>> ocfs2_controld[6883]: 2011/11/03_16:34:20 ERROR: crm_abort:
>>> send_ais_text: Triggered assert at corosync.c:352 : dest !=
>>> crm_msg_ais
>>> Sending message 0 via cpg: FAILED (rc=22): Message error: Success (0)
>>> ocfs2_controld[6883]: 2011/11/03_16:34:20 ERROR: send_ais_text:
>>> Sending message 0 via cpg: FAILED (rc=22): Message error: Success (0)
>>> ocfs2_controld[6883]: 2011/11/03_16:34:20 ERROR: crm_abort:
>>> send_ais_text: Triggered assert at corosync.c:352 : dest !=
>>> crm_msg_ais
>>> Sending message 1 via cpg: FAILED (rc=22): Message error: Success (0)
>>> ocfs2_controld[6883]: 2011/11/03_16:34:20 ERROR: send_ais_text:
>>> Sending message 1 via cpg: FAILED (rc=22): Message error: Success (0)
>>> 1320352460 setup_stack@170: Cluster connection established.  Local node id: 
>>> 1
>>> 1320352460 setup_stack@174: Added Pacemaker as client 1 with fd -1
>>>
>>>
>>> * The controld RA is using the standard dlm_controld, and this is now 
>>> working.
>>> * The o2cb RA is using ocfs2_controld.pcmk, and this is where I am running 
>>> into
>>> the runtime error with corosync.c
>>
>> As I mentioned in the last email, you're not supposed to use
>> ocfs2_controld.pcmk with cman.
>> You must use the standard ocfs2_controld
>>
>>>
>>>>
>>>> IMO (and as Florian alluded to in another message), you'd probably save
>>>> yourself a lot of trouble taking prebuilt packages from a distro where
>>>> the pieces you need are known to work together.
>>>
>>>> Indeed.
>>>
>>> There is no resenting that! But I am so close. Actually, I do have things
>>> working without the o2cb primitive, i.e., pcmk is starting the dual primary
>>> drbd, cloned dlm, and mounting the cloned ocfs2 filesystem:
>>>
>>> root@astdrbd1:~# /etc/init.d/cman start
>>> Starting cluster:
>>>   Checking if cluster has been disabled at boot... [  OK  ]
>>>   Checking Network Manager... [  OK  ]
>>>   Global setup... [  OK  ]
>>>   Loading kernel modules... [  OK  ]
>>>   Mounting configfs... [  OK  ]
>>>   Starting cman... [  OK  ]
>>>   Waiting for quorum... [  OK  ]
>>>   Starting fenced... [  OK  ]
>>>   Starting dlm_controld... [  OK  ]
>>>   Unfencing self... [  OK  ]
>>>   Joining fence domain... [  OK  ]
>>>
>>> root@astdrbd1:~# /etc/init.d/pacemaker start
>>> Starting Pacemaker Cluster Manager: touch: missing file operand
>>> Try `touch --help' for more information.
>>> [  OK  ]
>>>
>>>
>>> ============
>>> Last updated: Fri Nov 11 07:36:11 2011
>>> Last change: Fri Nov 11 07:33:06 2011 via crmd on astdrbd1
>>> Stack: cman
>>> Current DC: astdrbd1 - partition with quorum
>>> Version: 1.1.6-2d8fad5
>>> 2 Nodes configured, 2 expected votes
>>> 7 Resources configured.
>>> ============
>>>
>>> Online: [ astdrbd1 astdrbd2 ]
>>>
>>> astIP   (ocf::heartbeat:IPaddr2):       Started astdrbd1
>>>  Master/Slave Set: msASTDRBD [astDRBD]
>>>     Masters: [ astdrbd2 astdrbd1 ]
>>>  Clone Set: astDLMClone [astDLM]
>>>     Started: [ astdrbd2 astdrbd1 ]
>>>  Clone Set: astFilesystemClone [astFilesystem]
>>>     Started: [ astdrbd2 astdrbd1 ]
>>>
>>>
>>> Of course, o2cb is not pcmk cluster aware right now and needs to be
>>> started manually.
>>>
>>> Vladislav, if you are getting this I can test if the kernel bug that slows 
>>> down
>>> ocfs2 reported by you earlier. Is there any test you would like me to 
>>> perform?
>>>
>>>
>>> Kind Regards,
>>>
>>> Nick.
>>> _______________________________________________
>>> Linux-HA mailing list
>>> [email protected]
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>> See also: http://linux-ha.org/ReportingProblems
>>>
>> _______________________________________________
>> Linux-HA mailing list
>> [email protected]
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Error running corosync

Reply via email to