On Fri, Sep 05, 2008 at 09:58:58AM -0400, Jean-Francois Malouin wrote:
> * Dejan Muhamedagic <[EMAIL PROTECTED]> [20080905 06:58]:
> > On Thu, Sep 04, 2008 at 02:52:50PM -0400, Jean-Francois Malouin wrote:
[...]
> > > apache_id   (ocf::heartbeat:apache):  Started feeble-0 (unmanaged) FAILED
> > > 
> > > The whole log is online at:
> > > 
> > > http://www.bic.mni.mcgill.ca/~malin/heartbeat/messages-20080903.txt
> > > 
> > > as well as my in-house entire setup:
> > > 
> > > http://www.bic.mni.mcgill.ca/~malin/heartbeat/howto-heartbeat-drbd.txt
> > > 
> > > The ha.cf and cib.ml along with the drbd config file and status
> > > while verifying are also there.
> > > 
> > > After that the group fs->NFS->IP->mysql->apache hangs on the node
> > > rather than failover as the resource apache is reported as 'unmanaged'.
> > 
> > That shouldn't be the reason. This looks like a bug. Perhaps you
> > can upgrade pacemaker to the latest stable and see how it
> > behaves. If the same happens, please file a bugzilla and attach a
> > hb_report tarball.
> 
> It's my intent to upgrade both heartbeat and pacemaker but how
> do you do this on a live cluster? Put one node standby,
> upgrade, put it back online and do the same for the other node?

If you want to keep your resources running without failover, try
this:

http://www.linux-ha.org/TransparentUpgrade

> What happens when there are nodes not exactly at the same revision 
> level on a live cluster?

The cluster should be able to handle that.

> > 
> > > Sometimes mysql will also stop but not always.
> > > 
> > > The only way out I found (suggested on this list) is to manually
> > > remove the resource from the LRM (a failover then occurs)
> > 
> > Using crm_resource -C?
> 
> yep.
> I now realize after some thoughts that it might be the apache RA
> that's not quite ocf-complient...I will test it and report back.

Hmm, if it's the ocf/heartbeat/apache then it definitely should
be.

> > > but I'd like
> > > to know where is my mistake: measly hardware that can't cope with the
> > > load, my HA setup not quite robust enough or should I increase the
> > > timeout for apache (60s)? 
> > 
> > My guess is that the problem's somewhere in the drbd
> > configuration/disk system. For whatever reason, verifying the
> > disks (drbdadm) hogs your hosts.
> > 
> > Thanks,
> > 
> > Dejan
> 
> Thanks for the tips,
> jf

Welcome.

Dejan
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to