Hi Dejan, Thank you for your follow. We will ignore the detected ComponentFail error messages after this.
Talking about atd, it runs in the default setting on Redhat which we use. Frankly speaking, we currently do not know about CTS well, but try to run it in hope that some developers would pick up the errors. We are learning about CTS by running and reporting errors in a while :) Thanks, Tadashiro Yoshida At Wed, 28 Nov 2007 14:46:55 +0100, [EMAIL PROTECTED] wrote in <[EMAIL PROTECTED]>: -------------------- > Hi, > > On Wed, Nov 28, 2007 at 09:05:10PM +0900, Tadashiro Yoshida wrote: > > Hi, > > > > We detected some errors in the ComponentFail test while running CTS with a > > dev version. It might be a CTS's problem with message handling. > > Please check it if it happens something wrong. > > > > Dev version: 76c25be5c854 > > # python CTSlab.py -v2 -r -c > > --facility local7 -L /var/log/ha-log-local7 500 2>&1 | tee cts.log > > > > ----------- > > Nov 26 19:22:09 x3650a CTS: Running test ComponentFail (x3650b) [16] > > Nov 26 19:22:10 x3650b heartbeat: [27967]: WARN: > > Managed /usr/lib64/heartbeat/stonithd process 27980 > > killed by signal 9 [SIGKILL - Kill, unblockable]. > > Nov 26 19:22:10 x3650b heartbeat: [27967]: ERROR: > > Respawning client "/usr/lib64/heartbeat/stonithd": > > Nov 26 19:22:10 x3650b heartbeat: [27967]: info: > > Starting child client "/usr/lib64/heartbeat/stonithd"(0,0) > > Nov 26 19:22:10 x3650b stonithd: [30753]: notice: > > /usr/lib64/heartbeat/stonithd start up successfully. > > : > > Nov 26 19:32:41 x3650a CTS: Patterns not found: > > ['x3650c crmd:.*LOST:.* x3650b ', > > 'Updating node state to member for x3650b'] > > Nov 26 19:32:41 x3650a CTS: Test ComponentFail failed > > [reason:Didn't find all expected patterns] > > Nov 26 19:32:41 x3650a CTS: Test ComponentFail (x3650b) [FAILED] > > ----------- > > This is probably due to the latest code which does a node reboot > on certain failures. AFAIK, the CTS was not updated. > > > Besides, it seams there are some failures in the stonithd testing, although > > the final message says it was succeeded. > > > > ----------- > > Nov 26 19:54:11 x3650a CTS: BadNews: Nov 26 19:46:26 x3650b stonithd: > > [26162]: > > CRIT: command ssh -q -x -n -l root "x3650c" "echo 'sleep 2; > > /sbin/reboot -nf' | SHELL=/bin/sh at now >/dev/null 2>&1" failed > > Nov 26 19:54:11 x3650a CTS: BadNews: Nov 26 19:46:53 x3650b stonithd: > > [23258]: > > ERROR: Failed to STONITH the node x3650c: optype=RESET, > > op_result=TIMEOUT > > Nov 26 19:54:11 x3650a CTS: BadNews: Nov 26 19:46:53 x3650b tengine: > > [26116]: > > ERROR: tengine_stonith_callback: Stonith of x3650c failed (2)... > > aborting transition. > > ----------- > > I noticed here that the second of two consecutive stonithd tests > on the same node fails. Is that the case here? Also, do you have > atd running on all nodes? > > Thanks, > > Dejan > > > > > Thanks, > > > > Tadashiro Yoshida > > > > > > _______________________________________________________ > > Linux-HA-Dev: [email protected] > > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > > Home Page: http://linux-ha.org/ > _______________________________________________________ > Linux-HA-Dev: [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > Home Page: http://linux-ha.org/ ------------------------------------- 吉田 忠城 研究企画部門OSSセンタ 〒108-8019 東京都港区港南1-9-1 NTT品川TWINSビル11F phone: 03-5860-5135, fax: 03-5463-5490 ------------------------------------- _______________________________________________________ Linux-HA-Dev: [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
