Hi Andrew.
On 8/14/07, Takenaka Kazuhiro <[EMAIL PROTECTED]> wrote:
> Hi Andrew.
>
> > On 8/9/07, Takenaka Kazuhiro <[EMAIL PROTECTED]> wrote:
> >> > Hi Andrew, Thank you for your reply.
> >> >
> >> > > On 8/8/07, Takenaka Kazuhiro <[EMAIL PROTECTED]> wrote:
> >> > >> > Hi All.
> >> > >> >
> >> > >> > I installed Heartbeat 2.1.2 into my cluster and tried
> >> > >> > the new way to invoke a cluster recommended in the following URL.
> >> > >> >
> >> > >> >
http://www.linux-ha.org/v2/faq/cib_changes_detected?highlight=%28v2/faq/%2
> >> > >> >
> >> > >> > It works sanely, so I think I'd better to take it the
> >> > >> > formal procedure of invoking my cluster that I am planning
> >> > >> > to test for.
> >> > >> >
> >> > >> > On the adoption of the new way, I want to know a proper
> >> > >> > timing to execute 'cibadmin -R -x cib.xml'. In other words,
> >> > >> > I want to know how to detect a cluster ready to respond
> >> > >> > client command's requests.
> >> > >> >
> >> > >> > If there is some command which enbales to detect the timing,
> >> > >> > it must be best.
> >> > >> >
> >> > >> > I think 'crm_mon -s' might be what I want.
> >> > >> >
> >> > >> > If 'crm_mon -s' shows 'Ok' at 1st field of it's report,
> >> > >> > I suppose that is a ready sign of a cluster for operators
> >> > >> > requests.
> >> > >> >
> >> > >> > Am I right?
> >> > >
> >> > > the best way, is to run:
> >> > > crmadmin -D # find out which node is the DC
> >> > > crmadmin -S {uname_of_dc} # find out what status it's in
> >> > >
> >> > > if it says S_IDLE, then now is a good time to make changes
> >> >
> >> > I tried your method on my 2 nodes cluster
> >> > but found a unfavorable behavior for me.
> >> >
> >> > Firstly, I performed 'crmadmin -D' before the start of
> >> > my cluster and the command got over immediatly with an
> >> > exit code 254.
> >> >
> >> > # crmadmin -D
> >> > # echo $?
> >> > 254
> >> >
> >> > It just went along the way I expected.
> >> >
> >> > In the next place, I invoked Heartbeats on both nodes of
> >> > my cluster and performed the command before the DC node
> >> > was elected.
> >> >
> >> > I expected the command would show some messages
> >> > which ment no DC node was elected and would got
> >> > over immediatly.
> >> >
> >> > But 'crmadmin -D' actually paused for tens of second,
> >> > then the command showed a message and got over with
> >> > an exit code 0.
> >> >
> >> > # crmadmin -D
> >> > No messages received in 30 seconds.. aborting
> >> > # echo $?
> >> > 0
> >
> > I'll commit this patch shortly that should resolve this:
> >
> > diff -r 9355bd3d9af3 crm/admin/crmadmin.c
> > --- a/crm/admin/crmadmin.c Thu Aug 09 15:24:21 2007 +0200
> > +++ b/crm/admin/crmadmin.c Fri Aug 10 10:06:48 2007 +0200
> > @@ -632,6 +632,7 @@ admin_message_timeout(gpointer data)
> > (int)message_timeout_ms/1000);
> > crm_err("No messages received in %d seconds",
> > (int)message_timeout_ms/1000);
> > + operation_status = -3;
> > g_main_quit(mainloop);
> > return FALSE;
> > }
> >
>
> I read your patch and the source of crmadmin.
>
> I understood your patch and the undocumented crmadmin's
> option '-t' was useful.
>
> If I perform 'crmadmin -D -t TIMEOUT-msec', it is certain to
> run out within TIMEOUT-msec, so I can wait a end of a DC election
> at my favorable precision. If 'crmadmin' failed to run out
> with an exit code 253, I have only to retry until the command
> execution succeed.
>
> But I found another problem.
>
> 'crmadmin -D' runs out with an exit code 1 even if it can
> get and show the node name of DC.
>
> I found the following message in /var/log/messages after
> the command execution run out.
>
> Aug 14 14:51:43 it-gx2 crmadmin: [23056]: info: crmd_ipc_connection_destroy:
Connection to CRMd was terminated
>
> I think this message should be concerned with the problem.
>
> How do you think?
You're right.
Fixed in http://hg.beekhof.net/lha/crm-dev/rev/46f826ba9650
Thanks for your patches.
Now I can wait for the server to be ready
by the following Bsh function.
wait_cluster_ready()
{
typeset dc
while ! dc=`crmadmin -D -t 1000`; do
echo "DC is not elected" 1>&2
sleep 1
done
dc=${dc#*: }
echo "DC is $dc" 1>&2
typeset dc_status cmd_output cmd_status errcnt=0
while true; do
cmd_output=`crmadmin -S $dc -t 1000`
cmd_status=$?
case $cmd_status in
0) # succeed to get dc_status
dc_status=${cmd_output#*: }
dc_status=${dc_status% *}
if [[ "$dc_status" = "S_IDLE" ]]; then
echo "Now cluster got up" 1>&2
return 0
else
echo "$cmd_output" 1>&2
(( errcnt = errcnt + 1 ))
fi
;;
253) # Connection timeout
(( errcnt = errcnt + 1 ))
;;
254) # Unable to connect with the DC
(( errcnt = errcnt + 1 ))
;;
*) # Unexpected error
echo "Unexpected error : $cmd_status" 1>&2
return 1
;;
esac
if (( errcnt > 10 )); then
echo "Too many errors occured" 1>&2
return 1
else
sleep 1
fi
done
}
Sincerely.
--
Takenaka Kazuhiro <[EMAIL PROTECTED]>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems