Hi Andrew.

> On 8/9/07, Takenaka Kazuhiro <[EMAIL PROTECTED]> wrote:
>> > Hi Andrew, Thank you for your reply.
>> >
>> >  > On 8/8/07, Takenaka Kazuhiro <[EMAIL PROTECTED]> wrote:
>> >  >> > Hi All.
>> >  >> >
>> >  >> > I installed Heartbeat 2.1.2 into my cluster and tried
>> >  >> > the new way to invoke a cluster recommended in the following URL.
>> >  >> >
>> >  >> > 
http://www.linux-ha.org/v2/faq/cib_changes_detected?highlight=%28v2/faq/%2
>> >  >> >
>> >  >> > It works sanely, so I think I'd better to take it the
>> >  >> > formal procedure of invoking my cluster that I am planning
>> >  >> > to test for.
>> >  >> >
>> >  >> > On the adoption of the new way, I want to know a proper
>> >  >> > timing to execute 'cibadmin -R -x cib.xml'.  In other words,
>> >  >> > I want to know how to detect a cluster ready to respond
>> >  >> > client command's requests.
>> >  >> >
>> >  >> > If there is some command which enbales to detect the timing,
>> >  >> > it must be best.
>> >  >> >
>> >  >> > I think 'crm_mon -s' might be what I want.
>> >  >> >
>> >  >> > If 'crm_mon -s' shows 'Ok' at 1st field of it's report,
>> >  >> > I suppose that is a ready sign of a cluster for operators
>> >  >> > requests.
>> >  >> >
>> >  >> > Am I right?
>> >  >
>> >  > the best way, is to run:
>> >  >    crmadmin -D   # find out which node is the DC
>> >  >    crmadmin -S {uname_of_dc} # find out what status it's in
>> >  >
>> >  > if it says S_IDLE, then now is a good time to make changes
>> >
>> > I tried your method on my 2 nodes cluster
>> > but found a unfavorable behavior for me.
>> >
>> > Firstly, I performed 'crmadmin -D' before the start of
>> > my cluster and the command got over immediatly with an
>> > exit code 254.
>> >
>> > # crmadmin -D
>> > # echo $?
>> > 254
>> >
>> > It just went along the way I expected.
>> >
>> > In the next place, I invoked Heartbeats on both nodes of
>> > my cluster and performed the command before the DC node
>> > was elected.
>> >
>> > I expected the command would show some messages
>> > which ment no DC node was elected and would got
>> > over immediatly.
>> >
>> > But 'crmadmin -D' actually paused for tens of second,
>> > then the command showed a message and got over with
>> > an exit code 0.
>> >
>> > # crmadmin -D
>> > No messages received in 30 seconds.. aborting
>> > # echo $?
>> > 0
>
> I'll commit this patch shortly that should resolve this:
>
> diff -r 9355bd3d9af3 crm/admin/crmadmin.c
> --- a/crm/admin/crmadmin.c      Thu Aug 09 15:24:21 2007 +0200
> +++ b/crm/admin/crmadmin.c      Fri Aug 10 10:06:48 2007 +0200
> @@ -632,6 +632,7 @@ admin_message_timeout(gpointer data)
>                 (int)message_timeout_ms/1000);
>         crm_err("No messages received in %d seconds",
>                 (int)message_timeout_ms/1000);
> +       operation_status = -3;
>         g_main_quit(mainloop);
>         return FALSE;
>  }
>

I read your patch and the source of crmadmin.

I understood your patch and the undocumented crmadmin's
option '-t' was useful.

If I perform 'crmadmin -D -t TIMEOUT-msec', it is certain to
run out within TIMEOUT-msec, so I can wait a end of a DC election
at my favorable precision. If 'crmadmin' failed to run out
with an exit code 253, I have only to retry until the command
execution succeed.

But I found another problem.

'crmadmin -D' runs out with an exit code 1 even if it can
get and show the node name of DC.

I found the following message in /var/log/messages after
the command execution run out.

Aug 14 14:51:43 it-gx2 crmadmin: [23056]: info: crmd_ipc_connection_destroy: 
Connection to CRMd was terminated

I think this message should be concerned with the problem.

How do you think?

Sincerely.
--
Takenaka Kazuhiro <[EMAIL PROTECTED]>

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to