Tadashiro Yoshida wrote: > Hi Dejan, > > Thank you for your follow. > We will ignore the detected ComponentFail error messages after this. > > Talking about atd, it runs in the default setting on Redhat which we use. > > Frankly speaking, we currently do not know about CTS well, but try to run it > in hope that some developers would pick up the errors. > > We are learning about CTS by running and reporting errors in a while :) > > Thanks, > > Tadashiro Yoshida > > > At Wed, 28 Nov 2007 14:46:55 +0100, [EMAIL PROTECTED] wrote in <[EMAIL > PROTECTED]>: > -------------------- > >> Hi, >> >> On Wed, Nov 28, 2007 at 09:05:10PM +0900, Tadashiro Yoshida wrote: >> >>> Hi, >>> >>> We detected some errors in the ComponentFail test while running CTS with a >>> dev version. It might be a CTS's problem with message handling. >>> Please check it if it happens something wrong. >>> >>> Dev version: 76c25be5c854 >>> # python CTSlab.py -v2 -r -c >>> --facility local7 -L /var/log/ha-log-local7 500 2>&1 | tee cts.log >>> >>> ----------- >>> Nov 26 19:22:09 x3650a CTS: Running test ComponentFail (x3650b) [16] >>> Nov 26 19:22:10 x3650b heartbeat: [27967]: WARN: >>> Managed /usr/lib64/heartbeat/stonithd process 27980 >>> killed by signal 9 [SIGKILL - Kill, unblockable]. >>> Nov 26 19:22:10 x3650b heartbeat: [27967]: ERROR: >>> Respawning client "/usr/lib64/heartbeat/stonithd": >>> Nov 26 19:22:10 x3650b heartbeat: [27967]: info: >>> Starting child client "/usr/lib64/heartbeat/stonithd"(0,0) >>> Nov 26 19:22:10 x3650b stonithd: [30753]: notice: >>> /usr/lib64/heartbeat/stonithd start up successfully. >>> : >>> Nov 26 19:32:41 x3650a CTS: Patterns not found: >>> ['x3650c crmd:.*LOST:.* x3650b ', >>> 'Updating node state to member for x3650b'] >>> Nov 26 19:32:41 x3650a CTS: Test ComponentFail failed >>> [reason:Didn't find all expected patterns] >>> Nov 26 19:32:41 x3650a CTS: Test ComponentFail (x3650b) [FAILED] >>> ----------- >>> >> This is probably due to the latest code which does a node reboot >> on certain failures. AFAIK, the CTS was not updated. >> >> >>> Besides, it seams there are some failures in the stonithd testing, although >>> the final message says it was succeeded. >>> >>> ----------- >>> Nov 26 19:54:11 x3650a CTS: BadNews: Nov 26 19:46:26 x3650b stonithd: >>> [26162]: >>> CRIT: command ssh -q -x -n -l root "x3650c" "echo 'sleep 2; >>> /sbin/reboot -nf' | SHELL=/bin/sh at now >/dev/null 2>&1" failed >>> Nov 26 19:54:11 x3650a CTS: BadNews: Nov 26 19:46:53 x3650b stonithd: >>> [23258]: >>> ERROR: Failed to STONITH the node x3650c: optype=RESET, >>> op_result=TIMEOUT >>> Nov 26 19:54:11 x3650a CTS: BadNews: Nov 26 19:46:53 x3650b tengine: >>> [26116]: >>> ERROR: tengine_stonith_callback: Stonith of x3650c failed (2)... >>> aborting transition. >>> ----------- >>> >> I noticed here that the second of two consecutive stonithd tests >> on the same node fails. Is that the case here? Also, do you have >> atd running on all nodes? >> >> Thanks, >> >> Dejan >> >> >>> Thanks, >>> >>> Tadashiro Yoshida >>> >>> >>> _______________________________________________________ >>> Linux-HA-Dev: [email protected] >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev >>> Home Page: http://linux-ha.org/ >>> >> _______________________________________________________ >> Linux-HA-Dev: [email protected] >> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev >> Home Page: http://linux-ha.org/ >> > > > ------------------------------------- > 吉田 忠城 > 研究企画部門OSSセンタ > 〒108-8019 東京都港区港南1-9-1 > NTT品川TWINSビル11F > phone: 03-5860-5135, fax: 03-5463-5490 > ------------------------------------- > _______________________________________________________ > Linux-HA-Dev: [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > Home Page: http://linux-ha.org/ > >
_______________________________________________________ Linux-HA-Dev: [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
