Hi Andrew. > On 8/9/07, Takenaka Kazuhiro <[EMAIL PROTECTED]> wrote: >> > Hi Andrew, Thank you for your reply. >> > >> > > On 8/8/07, Takenaka Kazuhiro <[EMAIL PROTECTED]> wrote: >> > >> > Hi All. >> > >> > >> > >> > I installed Heartbeat 2.1.2 into my cluster and tried >> > >> > the new way to invoke a cluster recommended in the following URL. >> > >> > >> > >> > http://www.linux-ha.org/v2/faq/cib_changes_detected?highlight=%28v2/faq/%2 >> > >> > >> > >> > It works sanely, so I think I'd better to take it the >> > >> > formal procedure of invoking my cluster that I am planning >> > >> > to test for. >> > >> > >> > >> > On the adoption of the new way, I want to know a proper >> > >> > timing to execute 'cibadmin -R -x cib.xml'. In other words, >> > >> > I want to know how to detect a cluster ready to respond >> > >> > client command's requests. >> > >> > >> > >> > If there is some command which enbales to detect the timing, >> > >> > it must be best. >> > >> > >> > >> > I think 'crm_mon -s' might be what I want. >> > >> > >> > >> > If 'crm_mon -s' shows 'Ok' at 1st field of it's report, >> > >> > I suppose that is a ready sign of a cluster for operators >> > >> > requests. >> > >> > >> > >> > Am I right? >> > > >> > > the best way, is to run: >> > > crmadmin -D # find out which node is the DC >> > > crmadmin -S {uname_of_dc} # find out what status it's in >> > > >> > > if it says S_IDLE, then now is a good time to make changes >> > >> > I tried your method on my 2 nodes cluster >> > but found a unfavorable behavior for me. >> > >> > Firstly, I performed 'crmadmin -D' before the start of >> > my cluster and the command got over immediatly with an >> > exit code 254. >> > >> > # crmadmin -D >> > # echo $? >> > 254 >> > >> > It just went along the way I expected. >> > >> > In the next place, I invoked Heartbeats on both nodes of >> > my cluster and performed the command before the DC node >> > was elected. >> > >> > I expected the command would show some messages >> > which ment no DC node was elected and would got >> > over immediatly. >> > >> > But 'crmadmin -D' actually paused for tens of second, >> > then the command showed a message and got over with >> > an exit code 0. >> > >> > # crmadmin -D >> > No messages received in 30 seconds.. aborting >> > # echo $? >> > 0 > > I'll commit this patch shortly that should resolve this: > > diff -r 9355bd3d9af3 crm/admin/crmadmin.c > --- a/crm/admin/crmadmin.c Thu Aug 09 15:24:21 2007 +0200 > +++ b/crm/admin/crmadmin.c Fri Aug 10 10:06:48 2007 +0200 > @@ -632,6 +632,7 @@ admin_message_timeout(gpointer data) > (int)message_timeout_ms/1000); > crm_err("No messages received in %d seconds", > (int)message_timeout_ms/1000); > + operation_status = -3; > g_main_quit(mainloop); > return FALSE; > } >
I read your patch and the source of crmadmin. I understood your patch and the undocumented crmadmin's option '-t' was useful. If I perform 'crmadmin -D -t TIMEOUT-msec', it is certain to run out within TIMEOUT-msec, so I can wait a end of a DC election at my favorable precision. If 'crmadmin' failed to run out with an exit code 253, I have only to retry until the command execution succeed. But I found another problem. 'crmadmin -D' runs out with an exit code 1 even if it can get and show the node name of DC. I found the following message in /var/log/messages after the command execution run out. Aug 14 14:51:43 it-gx2 crmadmin: [23056]: info: crmd_ipc_connection_destroy: Connection to CRMd was terminated I think this message should be concerned with the problem. How do you think? Sincerely. -- Takenaka Kazuhiro <[EMAIL PROTECTED]> _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
