Hate replying to myself... There's more and somewhere here is the real problem:
Apr 4 11:27:50 test-2 tengine: [13668]: info: te_fence_node:actions.c Executing reboot fencing operation (16) on test-1.domain (timeout=30000) Apr 4 11:27:50 test-2 stonithd: [13658]: info: Broadcasting the message succeeded: require others to stonith node test-1.domain. In other words, test-2 doesn't think that it can stonith test-1 and it sends a broadcast message to other stonith daemons to do that. So, the stonith agent on test-2 when asked for a host list doesn't include in the list test-1.domain. On Thu, Apr 19, 2007 at 08:01:20PM +0200, Dejan Muhamedagic wrote: > On Tue, Apr 17, 2007 at 03:53:41PM -0400, Bjorn Oglefjorn wrote: > > Here they are again. > > It looks like that this > > Apr 4 11:28:20 test-2 stonithd: [13658]: info: Failed to STONITH the node > test-1.domain: optype=1, op_result=2 > > means that the stonith operation timed out. I'll fix the code to > raise this to an error condition and include the descriptions. > > Before, we see: > > Apr 4 11:27:50 test-2 tengine: [13668]: info: te_fence_node:actions.c > Executing reboot fencing operation (16) on test-1.domain (timeout=30000) > > Note the timeout: 30secs. After some digging I found that it's > transition_timeout. Is 30 seconds enough time for your stonith > agent to perform the reset? > > Anyway, in CIB I found only this (crm_verify doesn't complain) I > find these two timeouts: > > <nvpair id="cib-bootstrap-options-transition_idle_timeout" > name="transition_idle_timeout" value="5min"/> > ... > <op id="test-1_DRAC_reset" name="reset" timeout="3min" prereq="nothing"/> > > 1. transition_timeout is not in the annotated CIB. > > 2. Should user specify this timeout in the crm_config section and > calculate the maximum value of all rsc operations' timeouts? > > 3. What's the difference between the transition_timeout and the > transition_idle_timeout? > > Andrew, can you please take a look. > > Thanks. > > > > > On 4/17/07, Andrew Beekhof <[EMAIL PROTECTED]> wrote: > > > > > >On 4/17/07, Bjorn Oglefjorn <[EMAIL PROTECTED]> wrote: > > >> I know that my plugin is getting called because of the logging that the > > >> plugin does. > > > > > >do we get to see that logging at all? preferably in the context of > > >the other log messages > > > > > >> That said, I also know my plugin is not receiving any 'reset' > > >> operation request from heartbeat. If you see below, request actions are > > >> logged. The only actions logged when node failure is simulated are: > > >> getconfignames, status, and gethosts, in that order. We should also see > > >> getinfo-devid and reset operations logged, but they are never present. > > >> --BO > > >> > > >> On 4/17/07, Andrew Beekhof <[EMAIL PROTECTED]> wrote: > > >> > > > >> > On 4/17/07, Bjorn Oglefjorn <[EMAIL PROTECTED]> wrote: > > >> > > Yes, I most certainly have. The stonith command-line tool has no > > >> > problem at > > >> > > all with the plugin. The following was run from test-1.domain. The > > >> > > indented log entries are from the debug log of the stonith plugin: > > >> > > > >> > I'm no stonith expert, but the outputs certainly look plausible > > >enough. > > >> > You kept the same CIB? > > >> > Are you sure your plugin is getting called? > > >> > > > >> > > root:~ # stonith -t external/drac4 > > >> > > DRAC_ADDR=test-2.drac.domainDRAC_LOGIN=root DRAC_PASSWD=******** -lS > > >> > > stonith: external/drac4 device OK. > > >> > > test-2.drac.domain > > >> > > > > >> > > [Tue Apr 17 09:57:20 2007] Requested Action for : getconfignames > > >> > > [Tue Apr 17 09:57:22 2007] Requested Action for test-2.drac.domain > > >: > > >> > status > > >> > > [Tue Apr 17 09:57:22 2007] Success: test-2.drac.domain is > > >reachable > > >> > > [Tue Apr 17 09:57:23 2007] Requested Action for : getinfo-devid > > >> > > [Tue Apr 17 09:57:24 2007] Requested Action for test-2.drac.domain > > >: > > >> > > gethosts > > >> > > > > >> > > root:~ # stonith -t external/drac4 > > >> > > DRAC_ADDR=test-2.drac.domainDRAC_LOGIN=root DRAC_PASSWD=******** -T > > >on > > >> > > test-2.domain > > >> > > > > >> > > [Tue Apr 17 09:57:28 2007] Requested Action for : getconfignames > > >> > > [Tue Apr 17 09:57:30 2007] Requested Action for test-2.drac.domain > > >: > > >> > status > > >> > > [Tue Apr 17 09:57:30 2007] Success: test-2.drac.domain is > > >reachable > > >> > > [Tue Apr 17 09:57:31 2007] Requested Action for : getinfo-devid > > >> > > [Tue Apr 17 09:57:33 2007] Requested Action for test-2.drac.domain: > > >on > > >> > > [Tue Apr 17 09:57:33 2007] test-2.drac.domain Initial Power Status > > >= > > >> > ON > > >> > > [Tue Apr 17 09:57:33 2007] Success: test-2.drac.domain Power > > >Status = > > >> > ON > > >> > > > > >> > > root:~ # stonith -t external/drac4 > > >> > > DRAC_ADDR=test-2.drac.domainDRAC_LOGIN=root DRAC_PASSWD=******** -T > > >> > > reset > > >> > > test-2.domain > > >> > > > > >> > > [Tue Apr 17 09:57:46 2007] Requested Action for : getconfignames > > >> > > [Tue Apr 17 09:57:48 2007] Requested Action for test-2.drac.domain > > >: > > >> > status > > >> > > [Tue Apr 17 09:57:48 2007] Success: test-2.drac.domain is > > >reachable > > >> > > [Tue Apr 17 09:57:49 2007] Requested Action for : getinfo-devid > > >> > > [Tue Apr 17 09:57:50 2007] Requested Action for test-2.drac.domain > > >: > > >> > reset > > >> > > [Tue Apr 17 09:57:50 2007] test-2.drac.domain Initial Power Status > > >= > > >> > ON > > >> > > [Tue Apr 17 09:57:58 2007] Success: test-2.drac.domain Power > > >Status = > > >> > > RESET > > >> > > > > >> > > --BO > > >> > > > > >> > > On 4/17/07, Andrew Beekhof <[EMAIL PROTECTED]> wrote: > > >> > > > > > >> > > > On 4/16/07, Bjorn Oglefjorn <[EMAIL PROTECTED]> wrote: > > >> > > > > No ideas? > > >> > > > > > >> > > > none at all - have you tried calling it manually using the stonith > > >> > > > command-line tool to make sure it works? > > >> > > > > > >> > > > > On 4/9/07, Bjorn Oglefjorn <[EMAIL PROTECTED]> wrote: > > >> > > > > > > > >> > > > > > I quickly put together a STONITH plugin for testing this. It > > >> > conforms > > >> > > > to > > >> > > > > > the heartbeat spec and always lies to heartbeat returning > > >success > > >> > no > > >> > > > matter > > >> > > > > > what. With this plugin in place I'm still getting this error: > > >> > > > > > > > >> > > > > > Apr 9 15:40:47 test-2 stonithd: [8791]: info: Failed to > > >STONITH > > >> > the > > >> > > > node > > >> > > > > > test-1.domain: optype=1, op_result=2 > > >> > > > > > Apr 9 15:40:47 test-2 tengine: [8803]: info: > > >> > > > tengine_stonith_callback: > > >> > > > > > callbacks.c call=-4, optype=1, node_name= test-1.domain, > > >result=2, > > >> > > > > > node_list=, action=13;5:6eaeba12-87c3-465e-98f1-78585e71e495 > > >> > > > > > Apr 9 15:40:47 test-2 tengine: [8803]: ERROR: > > >> > > > tengine_stonith_callback: > > >> > > > > > callbacks.c Stonith of test-1.domain failed (2)... aborting > > >> > > > transition. > > >> > > > > > > > >> > > > > > --BO > > >> > > > _______________________________________________ > > >> > > > Linux-HA mailing list > > >> > > > [email protected] > > >> > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > >> > > > See also: http://linux-ha.org/ReportingProblems > > >> > > > > > >> > > _______________________________________________ > > >> > > Linux-HA mailing list > > >> > > [email protected] > > >> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > >> > > See also: http://linux-ha.org/ReportingProblems > > >> > > > > >> > _______________________________________________ > > >> > Linux-HA mailing list > > >> > [email protected] > > >> > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > >> > See also: http://linux-ha.org/ReportingProblems > > >> > > > >> _______________________________________________ > > >> Linux-HA mailing list > > >> [email protected] > > >> http://lists.linux-ha.org/mailman/listinfo/linux-ha > > >> See also: http://linux-ha.org/ReportingProblems > > >> > > >_______________________________________________ > > >Linux-HA mailing list > > >[email protected] > > >http://lists.linux-ha.org/mailman/listinfo/linux-ha > > >See also: http://linux-ha.org/ReportingProblems > > > > > > > _______________________________________________ > > Linux-HA mailing list > > [email protected] > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > > -- > Dejan > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems -- Dejan _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
