Hate replying to myself...

There's more and somewhere here is the real problem:

Apr  4 11:27:50 test-2 tengine: [13668]: info: te_fence_node:actions.c 
Executing reboot fencing operation (16) on test-1.domain (timeout=30000)
Apr  4 11:27:50 test-2 stonithd: [13658]: info: Broadcasting the message 
succeeded: require others to stonith node test-1.domain.

In other words, test-2 doesn't think that it can stonith test-1
and it sends a broadcast message to other stonith daemons to do
that.

So, the stonith agent on test-2 when asked for a host list doesn't
include in the list test-1.domain.

On Thu, Apr 19, 2007 at 08:01:20PM +0200, Dejan Muhamedagic wrote:
> On Tue, Apr 17, 2007 at 03:53:41PM -0400, Bjorn Oglefjorn wrote:
> > Here they are again.
> 
> It looks like that this
> 
> Apr  4 11:28:20 test-2 stonithd: [13658]: info: Failed to STONITH the node 
> test-1.domain: optype=1, op_result=2
> 
> means that the stonith operation timed out. I'll fix the code to
> raise this to an error condition and include the descriptions.
> 
> Before, we see:
> 
> Apr  4 11:27:50 test-2 tengine: [13668]: info: te_fence_node:actions.c 
> Executing reboot fencing operation (16) on test-1.domain (timeout=30000)
> 
> Note the timeout: 30secs. After some digging I found that it's
> transition_timeout. Is 30 seconds enough time for your stonith
> agent to perform the reset?
> 
> Anyway, in CIB I found only this (crm_verify doesn't complain) I
> find these two timeouts:
> 
> <nvpair id="cib-bootstrap-options-transition_idle_timeout" 
> name="transition_idle_timeout" value="5min"/>
> ...
> <op id="test-1_DRAC_reset" name="reset" timeout="3min" prereq="nothing"/>
> 
> 1. transition_timeout is not in the annotated CIB.
> 
> 2. Should user specify this timeout in the crm_config section and
> calculate the maximum value of all rsc operations' timeouts?
> 
> 3. What's the difference between the transition_timeout and the
> transition_idle_timeout?
> 
> Andrew, can you please take a look.
> 
> Thanks.
> 
> > 
> > On 4/17/07, Andrew Beekhof <[EMAIL PROTECTED]> wrote:
> > >
> > >On 4/17/07, Bjorn Oglefjorn <[EMAIL PROTECTED]> wrote:
> > >> I know that my plugin is getting called because of the logging that the
> > >> plugin does.
> > >
> > >do we get to see that logging at all?  preferably in the context of
> > >the other log messages
> > >
> > >> That said, I also know my plugin is not receiving any 'reset'
> > >> operation request from heartbeat.  If you see below, request actions are
> > >> logged.  The only actions logged when node failure is simulated are:
> > >> getconfignames, status, and gethosts, in that order.  We should also see
> > >> getinfo-devid and reset operations logged, but they are never present.
> > >> --BO
> > >>
> > >> On 4/17/07, Andrew Beekhof <[EMAIL PROTECTED]> wrote:
> > >> >
> > >> > On 4/17/07, Bjorn Oglefjorn <[EMAIL PROTECTED]> wrote:
> > >> > > Yes, I most certainly have.  The stonith command-line tool has no
> > >> > problem at
> > >> > > all with the plugin.  The following was run from test-1.domain.  The
> > >> > > indented log entries are from the debug log of the stonith plugin:
> > >> >
> > >> > I'm no stonith expert, but the outputs certainly look plausible
> > >enough.
> > >> > You kept the same CIB?
> > >> > Are you sure your plugin is getting called?
> > >> >
> > >> > > root:~ # stonith -t external/drac4
> > >> > > DRAC_ADDR=test-2.drac.domainDRAC_LOGIN=root DRAC_PASSWD=******** -lS
> > >> > > stonith: external/drac4 device OK.
> > >> > > test-2.drac.domain
> > >> > >
> > >> > >   [Tue Apr 17 09:57:20 2007] Requested Action for : getconfignames
> > >> > >   [Tue Apr 17 09:57:22 2007] Requested Action for test-2.drac.domain
> > >:
> > >> > status
> > >> > >   [Tue Apr 17 09:57:22 2007] Success: test-2.drac.domain is
> > >reachable
> > >> > >   [Tue Apr 17 09:57:23 2007] Requested Action for : getinfo-devid
> > >> > >   [Tue Apr 17 09:57:24 2007] Requested Action for test-2.drac.domain
> > >:
> > >> > > gethosts
> > >> > >
> > >> > > root:~ # stonith -t external/drac4
> > >> > > DRAC_ADDR=test-2.drac.domainDRAC_LOGIN=root DRAC_PASSWD=******** -T
> > >on
> > >> > > test-2.domain
> > >> > >
> > >> > >   [Tue Apr 17 09:57:28 2007] Requested Action for : getconfignames
> > >> > >   [Tue Apr 17 09:57:30 2007] Requested Action for test-2.drac.domain
> > >:
> > >> > status
> > >> > >   [Tue Apr 17 09:57:30 2007] Success: test-2.drac.domain is
> > >reachable
> > >> > >   [Tue Apr 17 09:57:31 2007] Requested Action for : getinfo-devid
> > >> > >   [Tue Apr 17 09:57:33 2007] Requested Action for test-2.drac.domain:
> > >on
> > >> > >   [Tue Apr 17 09:57:33 2007] test-2.drac.domain Initial Power Status
> > >=
> > >> > ON
> > >> > >   [Tue Apr 17 09:57:33 2007] Success: test-2.drac.domain Power
> > >Status =
> > >> > ON
> > >> > >
> > >> > > root:~ # stonith -t external/drac4
> > >> > > DRAC_ADDR=test-2.drac.domainDRAC_LOGIN=root DRAC_PASSWD=******** -T
> > >> > > reset
> > >> > > test-2.domain
> > >> > >
> > >> > >   [Tue Apr 17 09:57:46 2007] Requested Action for : getconfignames
> > >> > >   [Tue Apr 17 09:57:48 2007] Requested Action for test-2.drac.domain
> > >:
> > >> > status
> > >> > >   [Tue Apr 17 09:57:48 2007] Success: test-2.drac.domain is
> > >reachable
> > >> > >   [Tue Apr 17 09:57:49 2007] Requested Action for : getinfo-devid
> > >> > >   [Tue Apr 17 09:57:50 2007] Requested Action for test-2.drac.domain
> > >:
> > >> > reset
> > >> > >   [Tue Apr 17 09:57:50 2007] test-2.drac.domain Initial Power Status
> > >=
> > >> > ON
> > >> > >   [Tue Apr 17 09:57:58 2007] Success: test-2.drac.domain Power
> > >Status =
> > >> > > RESET
> > >> > >
> > >> > > --BO
> > >> > >
> > >> > > On 4/17/07, Andrew Beekhof <[EMAIL PROTECTED]> wrote:
> > >> > > >
> > >> > > > On 4/16/07, Bjorn Oglefjorn <[EMAIL PROTECTED]> wrote:
> > >> > > > > No ideas?
> > >> > > >
> > >> > > > none at all - have you tried calling it manually using the stonith
> > >> > > > command-line tool to make sure it works?
> > >> > > >
> > >> > > > > On 4/9/07, Bjorn Oglefjorn <[EMAIL PROTECTED]> wrote:
> > >> > > > > >
> > >> > > > > > I quickly put together a STONITH plugin for testing this.  It
> > >> > conforms
> > >> > > > to
> > >> > > > > > the heartbeat spec and always lies to heartbeat returning
> > >success
> > >> > no
> > >> > > > matter
> > >> > > > > > what.  With this plugin in place I'm still getting this error:
> > >> > > > > >
> > >> > > > > > Apr  9 15:40:47 test-2 stonithd: [8791]: info: Failed to
> > >STONITH
> > >> > the
> > >> > > > node
> > >> > > > > > test-1.domain: optype=1, op_result=2
> > >> > > > > > Apr  9 15:40:47 test-2 tengine: [8803]: info:
> > >> > > > tengine_stonith_callback:
> > >> > > > > > callbacks.c call=-4, optype=1, node_name= test-1.domain,
> > >result=2,
> > >> > > > > > node_list=, action=13;5:6eaeba12-87c3-465e-98f1-78585e71e495
> > >> > > > > > Apr  9 15:40:47 test-2 tengine: [8803]: ERROR:
> > >> > > > tengine_stonith_callback:
> > >> > > > > > callbacks.c Stonith of test-1.domain failed (2)... aborting
> > >> > > > transition.
> > >> > > > > >
> > >> > > > > > --BO
> > >> > > > _______________________________________________
> > >> > > > Linux-HA mailing list
> > >> > > > [email protected]
> > >> > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > >> > > > See also: http://linux-ha.org/ReportingProblems
> > >> > > >
> > >> > > _______________________________________________
> > >> > > Linux-HA mailing list
> > >> > > [email protected]
> > >> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > >> > > See also: http://linux-ha.org/ReportingProblems
> > >> > >
> > >> > _______________________________________________
> > >> > Linux-HA mailing list
> > >> > [email protected]
> > >> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > >> > See also: http://linux-ha.org/ReportingProblems
> > >> >
> > >> _______________________________________________
> > >> Linux-HA mailing list
> > >> [email protected]
> > >> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > >> See also: http://linux-ha.org/ReportingProblems
> > >>
> > >_______________________________________________
> > >Linux-HA mailing list
> > >[email protected]
> > >http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > >See also: http://linux-ha.org/ReportingProblems
> > >
> 
> 
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> 
> -- 
> Dejan
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

-- 
Dejan
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to