Thanks for the fix :)
I've checked out http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/05c8b63cbca7
Now ports are open.

I've encountered other problem though.
I have 2 boxes in cluster - box1 and box2. Third box, not in cluster, is named farm.
All of them are running Gentoo. Cluster stack - openais-1.0.1

Trying to issue cibadmin -Q from farm:
1234 - plain port
CIB_server=box1.cluster CIB_port=1234 cibadmin -Q
Password:
cibadmin: Connection to box1.cluster:1234 failed:
Signon to CIB failed:
Init failed, could not perform requested operations
and exits immediately

12345 - tls port
CIB_server=box1.cluster CIB_port=12345 cibadmin -Q
Password:


and it freezes. In logs on box1 i can see following:
Sep 28 18:03:30 box1 cib: [3342]: ERROR: crm_xml_err: XML Error: Entity: line 1: parsererror : Start tag expected, '<' not found
Sep 28 18:03:30 box1 cib: [3342]: ERROR: crm_xml_err: XML Error:
Sep 28 18:03:30 box1 cib: [3342]: ERROR: crm_xml_err: XML Error: ^
Sep 28 18:03:30 box1 cib: [3342]: WARN: string2xml: Parsing failed (domain=1, level=3, code=4): Start tag expected, '<' not found Sep 28 18:03:30 box1 cib: [3342]: ERROR: string2xml: Couldn't parse 3 chars: Sep 28 18:03:30 box1 cib: [3342]: ERROR: cib_recv_remote_msg: Couldn't parse: ''

After that i'm unable to run on box1 neither crm_mon (writes Attempting connection to the cluster...) nor cibadmin -Q - it waits for a while and then writes following:
 Signon to CIB failed: reply failed
Init failed, could not perform requested operations

On box2 crm_mon runs, but it doesn't reflect changes in cluster. running cibadmin -Q waits for a while, then shows following:
Call cib_query failed (-41): Remote node did not respond
<null>

Finally, in a few minutes i've found errors in logs (i think they are caused by my attempt to connect to cluster remotely), so attaching.

Thanks.

On Sep 21, 2009, at 13:53, Andrew Beekhof wrote:

I had a look at this, and basically I broke the initialization.
I'll fix this today for 1.0.6

On Thu, Sep 10, 2009 at 10:05 PM, Andrew Beekhof <[email protected]> wrote:
Strange. I'll take a look on monday (after my vacation).

On Wed, Sep 9, 2009 at 10:30 AM, Alexander
Bodnarashik<[email protected]> wrote:

On Sep 08, 2009, at 09:26, Andrew Beekhof wrote:

On Fri, Sep 4, 2009 at 5:35 PM, Alexander Bodnarashik<[email protected] >
wrote:

Hi. I'm trying to enable remote connections to cluster, but with no
luck, netstat does not show those ports as opened, logs tell me
nothing as well.

Were those port values in the CIB when the cluster started? If not,
restart the cluster software.
Otherwise, check if TLS support was enabled when you built pacemaker.

Both port values were set before cluster started.

I didn't find tls-related options in pacemaker "./configure". But tls was
found  on system during configure script run:
...
checking gnutls/gnutls.h usability... yes
checking gnutls/gnutls.h presence... yes
checking for gnutls/gnutls.h... yes
checking for security/pam_appl.h... (cached) yes
checking for pam/pam_appl.h... (cached) no
checking for libgnutls-config... /usr/bin/libgnutls-config
checking for gnutls header flags... -I/usr/include
checking for gnutls library flags... -L/usr/lib -lgnutls -lgcrypt
-lgpg-error
...

also cibadmin linked against gnutls:
 ldd `which cibadmin`|grep tls
       libgnutls.so.26 => /usr/lib/libgnutls.so.26 (0xb7fc5000)
So i suppose that tls is enabled.

I'm also attaching logs, corosync config and cib.
Thanks.



_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Sep 28 18:17:56 box1 crmd: [3346]: info: crm_timer_popped: PEngine Recheck 
Timer (I_PE_CALC) just popped!
Sep 28 18:17:56 box1 crmd: [3346]: info: do_state_transition: State transition 
S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED 
origin=crm_timer_popped ]
Sep 28 18:17:56 box1 crmd: [3346]: WARN: do_state_transition: Progressed to 
state S_POLICY_ENGINE after C_TIMER_POPPED
Sep 28 18:17:56 box1 crmd: [3346]: info: do_state_transition: All 2 cluster 
nodes are eligible to run resources.
Sep 28 18:17:56 box1 crmd: [3346]: info: do_pe_invoke: Query 84: Requesting the 
current CIB: S_POLICY_ENGINE
Sep 28 18:18:56 box1 corosync[3327]:   [pcmk  ] info: pcmk_ipc_exit: Client 
crmd (conn=0x9fff640, async-conn=0x9fff640) left
Sep 28 18:18:56 box1 crmd: [3346]: ERROR: do_pe_invoke_callback: Cant retrieve 
the CIB: Remote node did not respond
Sep 28 18:18:56 box1 crmd: [3346]: ERROR: do_log: FSA: Input I_ERROR from 
do_pe_invoke_callback() received in state S_POLICY_ENGINE
Sep 28 18:18:56 box1 crmd: [3346]: info: do_state_transition: State transition 
S_POLICY_ENGINE -> S_RECOVERY [ input=I_ERROR cause=C_FSA_INTERNAL 
origin=do_pe_invoke_callback ]
Sep 28 18:18:56 box1 crmd: [3346]: ERROR: do_recover: Action A_RECOVER 
(0000000001000000) not supported
Sep 28 18:18:56 box1 crmd: [3346]: WARN: do_election_vote: Not voting in 
election, we're in state S_RECOVERY
Sep 28 18:18:56 box1 crmd: [3346]: info: do_dc_release: DC role released
Sep 28 18:18:56 box1 crmd: [3346]: info: pe_connection_destroy: Connection to 
the Policy Engine released
Sep 28 18:18:56 box1 crmd: [3346]: info: do_te_control: Transitioner is now 
inactive
Sep 28 18:18:56 box1 crmd: [3346]: info: do_te_control: Disconnecting STONITH...
Sep 28 18:18:56 box1 crmd: [3346]: info: tengine_stonith_connection_destroy: 
Fencing daemon disconnected
Sep 28 18:18:56 box1 crmd: [3346]: notice: Not currently connected.
Sep 28 18:18:56 box1 crmd: [3346]: ERROR: do_log: FSA: Input I_TERMINATE from 
do_recover() received in state S_RECOVERY
Sep 28 18:18:56 box1 crmd: [3346]: info: do_state_transition: State transition 
S_RECOVERY -> S_TERMINATE [ input=I_TERMINATE cause=C_FSA_INTERNAL 
origin=do_recover ]
Sep 28 18:18:56 box1 crmd: [3346]: ERROR: verify_stopped: Resource ip_mysql was 
active at shutdown.  You may ignore this error if it is unmanaged.
Sep 28 18:18:56 box1 crmd: [3346]: notice: ghash_print_pending_for_rsc: 
Recurring action ip_mysql:19 (ip_mysql_monitor_5000) incomplete at shutdown
Sep 28 18:18:56 box1 crmd: [3346]: ERROR: verify_stopped: Resource pingd:1 was 
active at shutdown.  You may ignore this error if it is unmanaged.
Sep 28 18:18:56 box1 crmd: [3346]: notice: ghash_print_pending_for_rsc: 
Recurring action pingd:1:10 (pingd:1_monitor_10000) incomplete at shutdown
Sep 28 18:18:56 box1 crmd: [3346]: ERROR: verify_stopped: Resource mysql was 
active at shutdown.  You may ignore this error if it is unmanaged.
Sep 28 18:18:56 box1 crmd: [3346]: notice: ghash_print_pending_for_rsc: 
Recurring action mysql:23 (mysql_monitor_5000) incomplete at shutdown
Sep 28 18:18:56 box1 crmd: [3346]: ERROR: verify_stopped: Resource drbd:1 was 
active at shutdown.  You may ignore this error if it is unmanaged.
Sep 28 18:18:56 box1 crmd: [3346]: ERROR: verify_stopped: Resource fs_r0 was 
active at shutdown.  You may ignore this error if it is unmanaged.
Sep 28 18:18:56 box1 crmd: [3346]: info: do_lrm_control: Disconnected from the 
LRM
Sep 28 18:18:56 box1 crmd: [3346]: info: do_ha_control: Disconnected from 
OpenAIS
Sep 28 18:18:56 box1 crmd: [3346]: info: do_cib_control: Disconnecting CIB
Sep 28 18:18:56 box1 crmd: [3346]: info: crmd_cib_connection_destroy: 
Connection to the CIB terminated...
Sep 28 18:18:56 box1 crmd: [3346]: info: do_exit: Performing A_EXIT_0 - 
gracefully exiting the CRMd
Sep 28 18:18:56 box1 crmd: [3346]: ERROR: do_exit: Could not recover from 
internal error
Sep 28 18:18:56 box1 crmd: [3346]: info: free_mem: Dropping I_PENDING: [ 
state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_election_vote ]
Sep 28 18:18:56 box1 crmd: [3346]: info: free_mem: Dropping I_RELEASE_SUCCESS: 
[ state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_dc_release ]
Sep 28 18:18:56 box1 crmd: [3346]: info: free_mem: Dropping I_TERMINATE: [ 
state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_stop ]
Sep 28 18:18:56 box1 crmd: [3346]: info: do_exit: [crmd] stopped (2)
Sep 28 18:18:57 box1 corosync[3327]:   [pcmk  ] ERROR: pcmk_wait_dispatch: 
Child process crmd exited (pid=3346, rc=2)
Sep 28 18:18:57 box1 corosync[3327]:   [pcmk  ] notice: pcmk_wait_dispatch: 
Respawning failed child process: crmd
Sep 28 18:18:57 box1 corosync[3327]:   [pcmk  ] info: spawn_child: Forked child 
12914 for process crmd
Sep 28 18:18:57 box1 crmd: [12914]: info: Invoked: /usr/lib/heartbeat/crmd 
Sep 28 18:18:57 box1 crmd: [12914]: info: main: CRM Hg Version: 
05c8b63cbca7ce95182bb41881b3c5677f20bd5c
Sep 28 18:18:57 box1 crmd: [12914]: info: crmd_init: Starting crmd
Sep 28 18:18:57 box1 crmd: [12914]: info: G_main_add_SignalHandler: Added 
signal handler for signal 17
Sep 28 18:20:01 box1 cibadmin: [12915]: info: Invoked: cibadmin -Q 
Sep 28 18:20:02 box1 cron[12917]: (root) CMD (test -x /usr/sbin/run-crons && 
/usr/sbin/run-crons )
Sep 28 18:20:57 box1 crmd: [12914]: WARN: xmlfromIPC: No message received in 
the required interval (120s)
Sep 28 18:20:57 box1 crmd: [12914]: ERROR: get_channel_token: No reply message 
- empty
Sep 28 18:22:58 box1 crmd: [12914]: WARN: xmlfromIPC: No message received in 
the required interval (120s)
Sep 28 18:22:58 box1 crmd: [12914]: ERROR: get_channel_token: No reply message 
- empty
Sep 28 18:22:58 box1 crmd: [12914]: info: do_cib_control: Could not connect to 
the CIB service: reply failed
Sep 28 18:22:58 box1 crmd: [12914]: WARN: do_cib_control: Couldn't complete CIB 
registration 1 times... pause and retry
Sep 28 18:22:58 box1 crmd: [12914]: info: crmd_init: Starting crmd's mainloop
Sep 28 18:23:00 box1 crmd: [12914]: info: crm_timer_popped: Wait Timer (I_NULL) 
just popped!
Sep 28 18:25:00 box1 crmd: [12914]: WARN: xmlfromIPC: No message received in 
the required interval (120s)
Sep 28 18:25:00 box1 crmd: [12914]: ERROR: get_channel_token: No reply message 
- empty
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to