Re: [Linux-HA] Error running corosync

Nick Khamis Fri, 11 Nov 2011 05:06:18 -0800

Hello Andrew,

I do appologize for this, and really appreciate how far I have got into
this project thanks to everyone's help. Just as a quick summary:


the patch that you suggested did in fact fix the following (ais.c:346):

ocfs2_controld[14698]: 2011/11/02_11:32:19 ERROR: crm_abort:
send_ais_text: Triggered assert at ais.c:346 : dest != crm_msg_ais
Sending message 0 via cpg: FAILED (rc=22): Message error: Success (0)
ocfs2_controld[14698]: 2011/11/02_11:32:19 ERROR: send_ais_text:
Sending message 0 via cpg: FAILED (rc=22): Message error: Success (0)
ocfs2_controld[14698]: 2011/11/02_11:32:19 ERROR: crm_abort:
send_ais_text: Triggered assert at ais.c:346 : dest != crm_msg_ais
Sending message 1 via cpg: FAILED (rc=22): Message error: Success (0)
ocfs2_controld[14698]: 2011/11/02_11:32:19 ERROR: send_ais_text:
Sending message 1 via cpg: FAILED (rc=22): Message error: Success (0)
1320247939 setup_stack@170: Cluster connection established.  Local node id: 1
1320247939 setup_stack@174: Added Pacemaker as client 1 with fd -1

The run-time error I am getting now is in (corosync.c:352):

ocfs2_controld[6883]: 2011/11/03_16:34:20 info: crm_new_peer: Node 1
is now known as astdrbd1
ocfs2_controld[6883]: 2011/11/03_16:34:20 ERROR: crm_abort:
send_ais_text: Triggered assert at corosync.c:352 : dest !=
crm_msg_ais
Sending message 0 via cpg: FAILED (rc=22): Message error: Success (0)
ocfs2_controld[6883]: 2011/11/03_16:34:20 ERROR: send_ais_text:
Sending message 0 via cpg: FAILED (rc=22): Message error: Success (0)
ocfs2_controld[6883]: 2011/11/03_16:34:20 ERROR: crm_abort:
send_ais_text: Triggered assert at corosync.c:352 : dest !=
crm_msg_ais
Sending message 1 via cpg: FAILED (rc=22): Message error: Success (0)
ocfs2_controld[6883]: 2011/11/03_16:34:20 ERROR: send_ais_text:
Sending message 1 via cpg: FAILED (rc=22): Message error: Success (0)
1320352460 setup_stack@170: Cluster connection established.  Local node id: 1
1320352460 setup_stack@174: Added Pacemaker as client 1 with fd -1


* The controld RA is using the standard dlm_controld, and this is now working.
* The o2cb RA is using ocfs2_controld.pcmk, and this is where I am running into
the runtime error with corosync.c

>
> IMO (and as Florian alluded to in another message), you'd probably save
> yourself a lot of trouble taking prebuilt packages from a distro where
> the pieces you need are known to work together.

> Indeed.

There is no resenting that! But I am so close. Actually, I do have things
working without the o2cb primitive, i.e., pcmk is starting the dual primary
drbd, cloned dlm, and mounting the cloned ocfs2 filesystem:

root@astdrbd1:~# /etc/init.d/cman start
Starting cluster:
   Checking if cluster has been disabled at boot... [  OK  ]
   Checking Network Manager... [  OK  ]
   Global setup... [  OK  ]
   Loading kernel modules... [  OK  ]
   Mounting configfs... [  OK  ]
   Starting cman... [  OK  ]
   Waiting for quorum... [  OK  ]
   Starting fenced... [  OK  ]
   Starting dlm_controld... [  OK  ]
   Unfencing self... [  OK  ]
   Joining fence domain... [  OK  ]

root@astdrbd1:~# /etc/init.d/pacemaker start
Starting Pacemaker Cluster Manager: touch: missing file operand
Try `touch --help' for more information.
[  OK  ]


============
Last updated: Fri Nov 11 07:36:11 2011
Last change: Fri Nov 11 07:33:06 2011 via crmd on astdrbd1
Stack: cman
Current DC: astdrbd1 - partition with quorum
Version: 1.1.6-2d8fad5
2 Nodes configured, 2 expected votes
7 Resources configured.
============

Online: [ astdrbd1 astdrbd2 ]

astIP   (ocf::heartbeat:IPaddr2):       Started astdrbd1
 Master/Slave Set: msASTDRBD [astDRBD]
     Masters: [ astdrbd2 astdrbd1 ]
 Clone Set: astDLMClone [astDLM]
     Started: [ astdrbd2 astdrbd1 ]
 Clone Set: astFilesystemClone [astFilesystem]
     Started: [ astdrbd2 astdrbd1 ]


Of course, o2cb is not pcmk cluster aware right now and needs to be
started manually.

Vladislav, if you are getting this I can test if the kernel bug that slows down
ocfs2 reported by you earlier. Is there any test you would like me to perform?


Kind Regards,

Nick.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Error running corosync

Reply via email to