On 8/14/07, Takenaka Kazuhiro <[EMAIL PROTECTED]> wrote: > Hi Andrew. > > > On 8/9/07, Takenaka Kazuhiro <[EMAIL PROTECTED]> wrote: > >> > Hi Andrew, Thank you for your reply. > >> > > >> > > On 8/8/07, Takenaka Kazuhiro <[EMAIL PROTECTED]> wrote: > >> > >> > Hi All. > >> > >> > > >> > >> > I installed Heartbeat 2.1.2 into my cluster and tried > >> > >> > the new way to invoke a cluster recommended in the following URL. > >> > >> > > >> > >> > > http://www.linux-ha.org/v2/faq/cib_changes_detected?highlight=%28v2/faq/%2 > >> > >> > > >> > >> > It works sanely, so I think I'd better to take it the > >> > >> > formal procedure of invoking my cluster that I am planning > >> > >> > to test for. > >> > >> > > >> > >> > On the adoption of the new way, I want to know a proper > >> > >> > timing to execute 'cibadmin -R -x cib.xml'. In other words, > >> > >> > I want to know how to detect a cluster ready to respond > >> > >> > client command's requests. > >> > >> > > >> > >> > If there is some command which enbales to detect the timing, > >> > >> > it must be best. > >> > >> > > >> > >> > I think 'crm_mon -s' might be what I want. > >> > >> > > >> > >> > If 'crm_mon -s' shows 'Ok' at 1st field of it's report, > >> > >> > I suppose that is a ready sign of a cluster for operators > >> > >> > requests. > >> > >> > > >> > >> > Am I right? > >> > > > >> > > the best way, is to run: > >> > > crmadmin -D # find out which node is the DC > >> > > crmadmin -S {uname_of_dc} # find out what status it's in > >> > > > >> > > if it says S_IDLE, then now is a good time to make changes > >> > > >> > I tried your method on my 2 nodes cluster > >> > but found a unfavorable behavior for me. > >> > > >> > Firstly, I performed 'crmadmin -D' before the start of > >> > my cluster and the command got over immediatly with an > >> > exit code 254. > >> > > >> > # crmadmin -D > >> > # echo $? > >> > 254 > >> > > >> > It just went along the way I expected. > >> > > >> > In the next place, I invoked Heartbeats on both nodes of > >> > my cluster and performed the command before the DC node > >> > was elected. > >> > > >> > I expected the command would show some messages > >> > which ment no DC node was elected and would got > >> > over immediatly. > >> > > >> > But 'crmadmin -D' actually paused for tens of second, > >> > then the command showed a message and got over with > >> > an exit code 0. > >> > > >> > # crmadmin -D > >> > No messages received in 30 seconds.. aborting > >> > # echo $? > >> > 0 > > > > I'll commit this patch shortly that should resolve this: > > > > diff -r 9355bd3d9af3 crm/admin/crmadmin.c > > --- a/crm/admin/crmadmin.c Thu Aug 09 15:24:21 2007 +0200 > > +++ b/crm/admin/crmadmin.c Fri Aug 10 10:06:48 2007 +0200 > > @@ -632,6 +632,7 @@ admin_message_timeout(gpointer data) > > (int)message_timeout_ms/1000); > > crm_err("No messages received in %d seconds", > > (int)message_timeout_ms/1000); > > + operation_status = -3; > > g_main_quit(mainloop); > > return FALSE; > > } > > > > I read your patch and the source of crmadmin. > > I understood your patch and the undocumented crmadmin's > option '-t' was useful. > > If I perform 'crmadmin -D -t TIMEOUT-msec', it is certain to > run out within TIMEOUT-msec, so I can wait a end of a DC election > at my favorable precision. If 'crmadmin' failed to run out > with an exit code 253, I have only to retry until the command > execution succeed. > > But I found another problem. > > 'crmadmin -D' runs out with an exit code 1 even if it can > get and show the node name of DC. > > I found the following message in /var/log/messages after > the command execution run out. > > Aug 14 14:51:43 it-gx2 crmadmin: [23056]: info: crmd_ipc_connection_destroy: > Connection to CRMd was terminated > > I think this message should be concerned with the problem. > > How do you think?
You're right. Fixed in http://hg.beekhof.net/lha/crm-dev/rev/46f826ba9650 _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
