Hi,

On Tue, Dec 04, 2007 at 08:31:55PM -0500, Graham, Simon wrote:
> > On Mon, Dec 03, 2007 at 02:33:19PM -0500, Graham, Simon wrote:
> > > I've been seeing an occasional problem where resources are not
> > restarted

Occasional? Meaning sometimes works sometimes not?

> > > when the node running them is powered off; in the specific case in
> > > question, I have a 2-node cluster (with STONITH), the resources are
> > 
> > The only thing I could imagine happened was that the surviving
> > node tried to stonith the other, but couldn't in which case it'll
> > wait forever.
> > 
> 
> There was no attempt to stonith in the logs; maybe that's a clue -
> surely it should have scheduled the other node for a stonith before
> restarting the resources?

Yes, definitely. Once the node is assumed dead (no heartbeat), if
stonith is configured then the node makes sure that it is dead.

> > > running on one node (node1) and the DC is on the other node (node0)
> -
> > -
> > > after power cycling node1, I see the attached in the ha log file and
> > > then nothing else until about 3 minutes later when the other node
> > comes
> > > back.
> > 
> > Can you please give us full logs? For the whole 3 minutes.
> > 
> 
> Actually, I did -- there was nothing in the log at all until the other
> node came back to life. I've attached the full log in case it's useful -
> time frame of the failure is 13:40:03

There is no trace of a node trying to stonith the other one. That
is really strange.

> > > Any pointers to what to look at would be appreciated... it seems to
> > me
> > > that it is related the order in which the various events are
> > processed
> > > but I cant quite follow the code...
> > 
> > As usual, look for errors in the logs. Hmm, don't try to follow
> > the code, it could take a while...
> > 
> 
> No errors that I can see.

Me neither. And this is a basic functionality failing. I really
can't say why.

Another thing:

Nov 30 12:32:11 node0 heartbeat: [19665]: info: heartbeat: version 2.1.0.99

That version was never released. Where did you get it from? You
should try with 2.1.2 or one of the Andrew's later interim
builds. The new release should also soon be out.

Thanks,

Dejan

> Thanks for your help,
> Simon


> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to