Thanks for the fix :)
I've checked out http://hg.clusterlabs.org/pacemaker/stable-1.0/rev/05c8b63cbca7
Now ports are open.
I've encountered other problem though.
I have 2 boxes in cluster - box1 and box2. Third box, not in cluster,
is named farm.
All of them are running Gentoo. Cluster stack - openais-1.0.1
Trying to issue cibadmin -Q from farm:
1234 - plain port
CIB_server=box1.cluster CIB_port=1234 cibadmin -Q
Password:
cibadmin: Connection to box1.cluster:1234 failed:
Signon to CIB failed:
Init failed, could not perform requested operations
and exits immediately
12345 - tls port
CIB_server=box1.cluster CIB_port=12345 cibadmin -Q
Password:
and it freezes. In logs on box1 i can see following:
Sep 28 18:03:30 box1 cib: [3342]: ERROR: crm_xml_err: XML Error:
Entity: line 1: parsererror : Start tag expected, '<' not found
Sep 28 18:03:30 box1 cib: [3342]: ERROR: crm_xml_err: XML Error:
Sep 28 18:03:30 box1 cib: [3342]: ERROR: crm_xml_err: XML Error: ^
Sep 28 18:03:30 box1 cib: [3342]: WARN: string2xml: Parsing failed
(domain=1, level=3, code=4): Start tag expected, '<' not found
Sep 28 18:03:30 box1 cib: [3342]: ERROR: string2xml: Couldn't parse
3 chars:
Sep 28 18:03:30 box1 cib: [3342]: ERROR: cib_recv_remote_msg:
Couldn't parse: ''
After that i'm unable to run on box1 neither crm_mon (writes
Attempting connection to the cluster...) nor cibadmin -Q - it waits
for a while and then writes following:
Signon to CIB failed: reply failed
Init failed, could not perform requested operations
On box2 crm_mon runs, but it doesn't reflect changes in cluster.
running cibadmin -Q waits for a while, then shows following:
Call cib_query failed (-41): Remote node did not respond
<null>
Finally, in a few minutes i've found errors in logs (i think they are
caused by my attempt to connect to cluster remotely), so attaching.
Thanks.
On Sep 21, 2009, at 13:53, Andrew Beekhof wrote:
I had a look at this, and basically I broke the initialization.
I'll fix this today for 1.0.6
On Thu, Sep 10, 2009 at 10:05 PM, Andrew Beekhof
<[email protected]> wrote:
Strange. I'll take a look on monday (after my vacation).
On Wed, Sep 9, 2009 at 10:30 AM, Alexander
Bodnarashik<[email protected]> wrote:
On Sep 08, 2009, at 09:26, Andrew Beekhof wrote:
On Fri, Sep 4, 2009 at 5:35 PM, Alexander Bodnarashik<[email protected]
>
wrote:
Hi. I'm trying to enable remote connections to cluster, but with
no
luck, netstat does not show those ports as opened, logs tell me
nothing as well.
Were those port values in the CIB when the cluster started? If
not,
restart the cluster software.
Otherwise, check if TLS support was enabled when you built
pacemaker.
Both port values were set before cluster started.
I didn't find tls-related options in pacemaker "./configure". But
tls was
found on system during configure script run:
...
checking gnutls/gnutls.h usability... yes
checking gnutls/gnutls.h presence... yes
checking for gnutls/gnutls.h... yes
checking for security/pam_appl.h... (cached) yes
checking for pam/pam_appl.h... (cached) no
checking for libgnutls-config... /usr/bin/libgnutls-config
checking for gnutls header flags... -I/usr/include
checking for gnutls library flags... -L/usr/lib -lgnutls -lgcrypt
-lgpg-error
...
also cibadmin linked against gnutls:
ldd `which cibadmin`|grep tls
libgnutls.so.26 => /usr/lib/libgnutls.so.26 (0xb7fc5000)
So i suppose that tls is enabled.
I'm also attaching logs, corosync config and cib.
Thanks.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
Sep 28 18:17:56 box1 crmd: [3346]: info: crm_timer_popped: PEngine Recheck
Timer (I_PE_CALC) just popped!
Sep 28 18:17:56 box1 crmd: [3346]: info: do_state_transition: State transition
S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_TIMER_POPPED
origin=crm_timer_popped ]
Sep 28 18:17:56 box1 crmd: [3346]: WARN: do_state_transition: Progressed to
state S_POLICY_ENGINE after C_TIMER_POPPED
Sep 28 18:17:56 box1 crmd: [3346]: info: do_state_transition: All 2 cluster
nodes are eligible to run resources.
Sep 28 18:17:56 box1 crmd: [3346]: info: do_pe_invoke: Query 84: Requesting the
current CIB: S_POLICY_ENGINE
Sep 28 18:18:56 box1 corosync[3327]: [pcmk ] info: pcmk_ipc_exit: Client
crmd (conn=0x9fff640, async-conn=0x9fff640) left
Sep 28 18:18:56 box1 crmd: [3346]: ERROR: do_pe_invoke_callback: Cant retrieve
the CIB: Remote node did not respond
Sep 28 18:18:56 box1 crmd: [3346]: ERROR: do_log: FSA: Input I_ERROR from
do_pe_invoke_callback() received in state S_POLICY_ENGINE
Sep 28 18:18:56 box1 crmd: [3346]: info: do_state_transition: State transition
S_POLICY_ENGINE -> S_RECOVERY [ input=I_ERROR cause=C_FSA_INTERNAL
origin=do_pe_invoke_callback ]
Sep 28 18:18:56 box1 crmd: [3346]: ERROR: do_recover: Action A_RECOVER
(0000000001000000) not supported
Sep 28 18:18:56 box1 crmd: [3346]: WARN: do_election_vote: Not voting in
election, we're in state S_RECOVERY
Sep 28 18:18:56 box1 crmd: [3346]: info: do_dc_release: DC role released
Sep 28 18:18:56 box1 crmd: [3346]: info: pe_connection_destroy: Connection to
the Policy Engine released
Sep 28 18:18:56 box1 crmd: [3346]: info: do_te_control: Transitioner is now
inactive
Sep 28 18:18:56 box1 crmd: [3346]: info: do_te_control: Disconnecting STONITH...
Sep 28 18:18:56 box1 crmd: [3346]: info: tengine_stonith_connection_destroy:
Fencing daemon disconnected
Sep 28 18:18:56 box1 crmd: [3346]: notice: Not currently connected.
Sep 28 18:18:56 box1 crmd: [3346]: ERROR: do_log: FSA: Input I_TERMINATE from
do_recover() received in state S_RECOVERY
Sep 28 18:18:56 box1 crmd: [3346]: info: do_state_transition: State transition
S_RECOVERY -> S_TERMINATE [ input=I_TERMINATE cause=C_FSA_INTERNAL
origin=do_recover ]
Sep 28 18:18:56 box1 crmd: [3346]: ERROR: verify_stopped: Resource ip_mysql was
active at shutdown. You may ignore this error if it is unmanaged.
Sep 28 18:18:56 box1 crmd: [3346]: notice: ghash_print_pending_for_rsc:
Recurring action ip_mysql:19 (ip_mysql_monitor_5000) incomplete at shutdown
Sep 28 18:18:56 box1 crmd: [3346]: ERROR: verify_stopped: Resource pingd:1 was
active at shutdown. You may ignore this error if it is unmanaged.
Sep 28 18:18:56 box1 crmd: [3346]: notice: ghash_print_pending_for_rsc:
Recurring action pingd:1:10 (pingd:1_monitor_10000) incomplete at shutdown
Sep 28 18:18:56 box1 crmd: [3346]: ERROR: verify_stopped: Resource mysql was
active at shutdown. You may ignore this error if it is unmanaged.
Sep 28 18:18:56 box1 crmd: [3346]: notice: ghash_print_pending_for_rsc:
Recurring action mysql:23 (mysql_monitor_5000) incomplete at shutdown
Sep 28 18:18:56 box1 crmd: [3346]: ERROR: verify_stopped: Resource drbd:1 was
active at shutdown. You may ignore this error if it is unmanaged.
Sep 28 18:18:56 box1 crmd: [3346]: ERROR: verify_stopped: Resource fs_r0 was
active at shutdown. You may ignore this error if it is unmanaged.
Sep 28 18:18:56 box1 crmd: [3346]: info: do_lrm_control: Disconnected from the
LRM
Sep 28 18:18:56 box1 crmd: [3346]: info: do_ha_control: Disconnected from
OpenAIS
Sep 28 18:18:56 box1 crmd: [3346]: info: do_cib_control: Disconnecting CIB
Sep 28 18:18:56 box1 crmd: [3346]: info: crmd_cib_connection_destroy:
Connection to the CIB terminated...
Sep 28 18:18:56 box1 crmd: [3346]: info: do_exit: Performing A_EXIT_0 -
gracefully exiting the CRMd
Sep 28 18:18:56 box1 crmd: [3346]: ERROR: do_exit: Could not recover from
internal error
Sep 28 18:18:56 box1 crmd: [3346]: info: free_mem: Dropping I_PENDING: [
state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_election_vote ]
Sep 28 18:18:56 box1 crmd: [3346]: info: free_mem: Dropping I_RELEASE_SUCCESS:
[ state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_dc_release ]
Sep 28 18:18:56 box1 crmd: [3346]: info: free_mem: Dropping I_TERMINATE: [
state=S_TERMINATE cause=C_FSA_INTERNAL origin=do_stop ]
Sep 28 18:18:56 box1 crmd: [3346]: info: do_exit: [crmd] stopped (2)
Sep 28 18:18:57 box1 corosync[3327]: [pcmk ] ERROR: pcmk_wait_dispatch:
Child process crmd exited (pid=3346, rc=2)
Sep 28 18:18:57 box1 corosync[3327]: [pcmk ] notice: pcmk_wait_dispatch:
Respawning failed child process: crmd
Sep 28 18:18:57 box1 corosync[3327]: [pcmk ] info: spawn_child: Forked child
12914 for process crmd
Sep 28 18:18:57 box1 crmd: [12914]: info: Invoked: /usr/lib/heartbeat/crmd
Sep 28 18:18:57 box1 crmd: [12914]: info: main: CRM Hg Version:
05c8b63cbca7ce95182bb41881b3c5677f20bd5c
Sep 28 18:18:57 box1 crmd: [12914]: info: crmd_init: Starting crmd
Sep 28 18:18:57 box1 crmd: [12914]: info: G_main_add_SignalHandler: Added
signal handler for signal 17
Sep 28 18:20:01 box1 cibadmin: [12915]: info: Invoked: cibadmin -Q
Sep 28 18:20:02 box1 cron[12917]: (root) CMD (test -x /usr/sbin/run-crons &&
/usr/sbin/run-crons )
Sep 28 18:20:57 box1 crmd: [12914]: WARN: xmlfromIPC: No message received in
the required interval (120s)
Sep 28 18:20:57 box1 crmd: [12914]: ERROR: get_channel_token: No reply message
- empty
Sep 28 18:22:58 box1 crmd: [12914]: WARN: xmlfromIPC: No message received in
the required interval (120s)
Sep 28 18:22:58 box1 crmd: [12914]: ERROR: get_channel_token: No reply message
- empty
Sep 28 18:22:58 box1 crmd: [12914]: info: do_cib_control: Could not connect to
the CIB service: reply failed
Sep 28 18:22:58 box1 crmd: [12914]: WARN: do_cib_control: Couldn't complete CIB
registration 1 times... pause and retry
Sep 28 18:22:58 box1 crmd: [12914]: info: crmd_init: Starting crmd's mainloop
Sep 28 18:23:00 box1 crmd: [12914]: info: crm_timer_popped: Wait Timer (I_NULL)
just popped!
Sep 28 18:25:00 box1 crmd: [12914]: WARN: xmlfromIPC: No message received in
the required interval (120s)
Sep 28 18:25:00 box1 crmd: [12914]: ERROR: get_channel_token: No reply message
- empty
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems