On Tue, Nov 15, 2011 at 3:29 AM, Nick Khamis <[email protected]> wrote: > Hello Andrew, > > Thank you so much for your response. I wanted to clarify, I am running the > pacemaker stack,
If you are running pacemaker on top of cman do not use ocfs2_controld.pcmk Is that clearer? > and experiening errors with ocf:pacemaker:o2cb and > ocfs2_controld.pcmk. Tracking some of the o2cb processes, I waned to say that: > > * aisexec does contain: > > export > COROSYNC_DEFAULT_CONFIG_IFACE="openaisserviceenableexperimental:corosync_parser" > corosync "$@" > > And when issuing ocfs2_controld.pcmk -D, I am recieving the following error: > > ocfs2_controld[10601]: 2011/11/14_11:26:22 info: config_find_next: > Processing additional service options... > ocfs2_controld[10601]: 2011/11/14_11:26:22 info: get_config_opt: Found > 'corosync_quorum' for option: name > ocfs2_controld[10601]: 2011/11/14_11:26:22 info: config_find_next: > Processing additional service options... > ocfs2_controld[10601]: 2011/11/14_11:26:22 info: get_config_opt: Found > 'corosync_cman' for option: name > ocfs2_controld[10601]: 2011/11/14_11:26:22 info: config_find_next: > Processing additional service options... > ocfs2_controld[10601]: 2011/11/14_11:26:22 info: get_config_opt: Found > 'openais_clm' for option: name > ocfs2_controld[10601]: 2011/11/14_11:26:22 info: config_find_next: > Processing additional service options... > ocfs2_controld[10601]: 2011/11/14_11:26:22 info: get_config_opt: Found > 'openais_evt' for option: name > ocfs2_controld[10601]: 2011/11/14_11:26:22 info: config_find_next: > Processing additional service options... > ocfs2_controld[10601]: 2011/11/14_11:26:22 info: get_config_opt: Found > 'openais_ckpt' for option: name > ocfs2_controld[10601]: 2011/11/14_11:26:22 info: config_find_next: > Processing additional service options... > ocfs2_controld[10601]: 2011/11/14_11:26:22 info: get_config_opt: Found > 'openais_msg' for option: name > ocfs2_controld[10601]: 2011/11/14_11:26:22 info: config_find_next: > Processing additional service options... > ocfs2_controld[10601]: 2011/11/14_11:26:22 info: get_config_opt: Found > 'openais_lck' for option: name > ocfs2_controld[10601]: 2011/11/14_11:26:22 info: config_find_next: > Processing additional service options... > ocfs2_controld[10601]: 2011/11/14_11:26:22 info: get_config_opt: Found > 'openais_tmr' for option: name > ocfs2_controld[10601]: 2011/11/14_11:26:22 info: config_find_next: No > additional configuration supplied for: service > ocfs2_controld[10601]: 2011/11/14_11:26:22 info: config_find_next: > Processing additional quorum options... > ocfs2_controld[10601]: 2011/11/14_11:26:22 info: get_config_opt: Found > 'quorum_cman' for option: provider > ocfs2_controld[10601]: 2011/11/14_11:26:22 info: get_cluster_type: > Detected an active 'cman' cluster > ocfs2_controld[10601]: 2011/11/14_11:26:22 info: get_local_node_name: > Using CMAN node name: astdrbd1 > ocfs2_controld[10601]: 2011/11/14_11:26:22 info: > init_ais_connection_once: Connection to 'cman': established > ocfs2_controld[10601]: 2011/11/14_11:26:22 info: crm_new_peer: Node > astdrbd1 now has id: 1 > ocfs2_controld[10601]: 2011/11/14_11:26:22 info: crm_new_peer: Node 1 > is now known as astdrbd1 > ocfs2_controld[10601]: 2011/11/14_11:26:22 ERROR: crm_abort: > send_ais_text: Triggered assert at corosync.c:352 : dest != > crm_msg_ais > Sending message 0 via cpg: FAILED (rc=22): Message error: Success (0) > ocfs2_controld[10601]: 2011/11/14_11:26:22 ERROR: send_ais_text: > Sending message 0 via cpg: FAILED (rc=22): Message error: Success (0) > ocfs2_controld[10601]: 2011/11/14_11:26:22 ERROR: crm_abort: > send_ais_text: Triggered assert at corosync.c:352 : dest != > crm_msg_ais > Sending message 1 via cpg: FAILED (rc=22): Message error: Success (0) > ocfs2_controld[10601]: 2011/11/14_11:26:22 ERROR: send_ais_text: > Sending message 1 via cpg: FAILED (rc=22): Message error: Success (0) > 1321287982 setup_stack@170: Cluster connection established. Local node id: 1 > 1321287982 setup_stack@174: Added Pacemaker as client 1 with fd -1 > > Thanks in Advance, > > Nick. > > > > On Sun, Nov 13, 2011 at 7:44 PM, Andrew Beekhof <[email protected]> wrote: >> On Mon, Nov 14, 2011 at 11:12 AM, Nick Khamis <[email protected]> wrote: >>> Hello Andrew, >>> >>> Thank you so much for your response. I am using ocfs-tools 1.6.and it only >>> includes pcmk and cman ocfs2 controld: >>> >>> ocfs2_controld.cman ocfs2_controld.pcmk ocfs2_hb_ctl >>> >>> Which stack provides the standard ocfs2_controld? >> >> If you're running cman, use the cman one >> >>> >>> Thanks for Everything! >>> >>> Nick. >>> >>> If it's cman >>> >>> On Sun, Nov 13, 2011 at 6:49 PM, Andrew Beekhof <[email protected]> wrote: >>>> On Sat, Nov 12, 2011 at 12:06 AM, Nick Khamis <[email protected]> wrote: >>>>> Hello Andrew, >>>>> >>>>> I do appologize for this, and really appreciate how far I have got into >>>>> this project thanks to everyone's help. Just as a quick summary: >>>>> >>>>> the patch that you suggested did in fact fix the following (ais.c:346): >>>>> >>>>> ocfs2_controld[14698]: 2011/11/02_11:32:19 ERROR: crm_abort: >>>>> send_ais_text: Triggered assert at ais.c:346 : dest != crm_msg_ais >>>>> Sending message 0 via cpg: FAILED (rc=22): Message error: Success (0) >>>>> ocfs2_controld[14698]: 2011/11/02_11:32:19 ERROR: send_ais_text: >>>>> Sending message 0 via cpg: FAILED (rc=22): Message error: Success (0) >>>>> ocfs2_controld[14698]: 2011/11/02_11:32:19 ERROR: crm_abort: >>>>> send_ais_text: Triggered assert at ais.c:346 : dest != crm_msg_ais >>>>> Sending message 1 via cpg: FAILED (rc=22): Message error: Success (0) >>>>> ocfs2_controld[14698]: 2011/11/02_11:32:19 ERROR: send_ais_text: >>>>> Sending message 1 via cpg: FAILED (rc=22): Message error: Success (0) >>>>> 1320247939 setup_stack@170: Cluster connection established. Local node >>>>> id: 1 >>>>> 1320247939 setup_stack@174: Added Pacemaker as client 1 with fd -1 >>>>> >>>>> The run-time error I am getting now is in (corosync.c:352): >>>>> >>>>> ocfs2_controld[6883]: 2011/11/03_16:34:20 info: crm_new_peer: Node 1 >>>>> is now known as astdrbd1 >>>>> ocfs2_controld[6883]: 2011/11/03_16:34:20 ERROR: crm_abort: >>>>> send_ais_text: Triggered assert at corosync.c:352 : dest != >>>>> crm_msg_ais >>>>> Sending message 0 via cpg: FAILED (rc=22): Message error: Success (0) >>>>> ocfs2_controld[6883]: 2011/11/03_16:34:20 ERROR: send_ais_text: >>>>> Sending message 0 via cpg: FAILED (rc=22): Message error: Success (0) >>>>> ocfs2_controld[6883]: 2011/11/03_16:34:20 ERROR: crm_abort: >>>>> send_ais_text: Triggered assert at corosync.c:352 : dest != >>>>> crm_msg_ais >>>>> Sending message 1 via cpg: FAILED (rc=22): Message error: Success (0) >>>>> ocfs2_controld[6883]: 2011/11/03_16:34:20 ERROR: send_ais_text: >>>>> Sending message 1 via cpg: FAILED (rc=22): Message error: Success (0) >>>>> 1320352460 setup_stack@170: Cluster connection established. Local node >>>>> id: 1 >>>>> 1320352460 setup_stack@174: Added Pacemaker as client 1 with fd -1 >>>>> >>>>> >>>>> * The controld RA is using the standard dlm_controld, and this is now >>>>> working. >>>>> * The o2cb RA is using ocfs2_controld.pcmk, and this is where I am >>>>> running into >>>>> the runtime error with corosync.c >>>> >>>> As I mentioned in the last email, you're not supposed to use >>>> ocfs2_controld.pcmk with cman. >>>> You must use the standard ocfs2_controld >>>> >>>>> >>>>>> >>>>>> IMO (and as Florian alluded to in another message), you'd probably save >>>>>> yourself a lot of trouble taking prebuilt packages from a distro where >>>>>> the pieces you need are known to work together. >>>>> >>>>>> Indeed. >>>>> >>>>> There is no resenting that! But I am so close. Actually, I do have things >>>>> working without the o2cb primitive, i.e., pcmk is starting the dual >>>>> primary >>>>> drbd, cloned dlm, and mounting the cloned ocfs2 filesystem: >>>>> >>>>> root@astdrbd1:~# /etc/init.d/cman start >>>>> Starting cluster: >>>>> Checking if cluster has been disabled at boot... [ OK ] >>>>> Checking Network Manager... [ OK ] >>>>> Global setup... [ OK ] >>>>> Loading kernel modules... [ OK ] >>>>> Mounting configfs... [ OK ] >>>>> Starting cman... [ OK ] >>>>> Waiting for quorum... [ OK ] >>>>> Starting fenced... [ OK ] >>>>> Starting dlm_controld... [ OK ] >>>>> Unfencing self... [ OK ] >>>>> Joining fence domain... [ OK ] >>>>> >>>>> root@astdrbd1:~# /etc/init.d/pacemaker start >>>>> Starting Pacemaker Cluster Manager: touch: missing file operand >>>>> Try `touch --help' for more information. >>>>> [ OK ] >>>>> >>>>> >>>>> ============ >>>>> Last updated: Fri Nov 11 07:36:11 2011 >>>>> Last change: Fri Nov 11 07:33:06 2011 via crmd on astdrbd1 >>>>> Stack: cman >>>>> Current DC: astdrbd1 - partition with quorum >>>>> Version: 1.1.6-2d8fad5 >>>>> 2 Nodes configured, 2 expected votes >>>>> 7 Resources configured. >>>>> ============ >>>>> >>>>> Online: [ astdrbd1 astdrbd2 ] >>>>> >>>>> astIP (ocf::heartbeat:IPaddr2): Started astdrbd1 >>>>> Master/Slave Set: msASTDRBD [astDRBD] >>>>> Masters: [ astdrbd2 astdrbd1 ] >>>>> Clone Set: astDLMClone [astDLM] >>>>> Started: [ astdrbd2 astdrbd1 ] >>>>> Clone Set: astFilesystemClone [astFilesystem] >>>>> Started: [ astdrbd2 astdrbd1 ] >>>>> >>>>> >>>>> Of course, o2cb is not pcmk cluster aware right now and needs to be >>>>> started manually. >>>>> >>>>> Vladislav, if you are getting this I can test if the kernel bug that >>>>> slows down >>>>> ocfs2 reported by you earlier. Is there any test you would like me to >>>>> perform? >>>>> >>>>> >>>>> Kind Regards, >>>>> >>>>> Nick. >>>>> _______________________________________________ >>>>> Linux-HA mailing list >>>>> [email protected] >>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>> See also: http://linux-ha.org/ReportingProblems >>>>> >>>> _______________________________________________ >>>> Linux-HA mailing list >>>> [email protected] >>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>> See also: http://linux-ha.org/ReportingProblems >>>> >>> _______________________________________________ >>> Linux-HA mailing list >>> [email protected] >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>> See also: http://linux-ha.org/ReportingProblems >>> >> _______________________________________________ >> Linux-HA mailing list >> [email protected] >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems >> > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
