Hi,

On Fri, Dec 10, 2010 at 09:27:15AM +0100, Andrew Beekhof wrote:
> On Thu, Dec 9, 2010 at 10:53 PM, Bart Coninckx <[email protected]> 
> wrote:
> > On Thursday 09 December 2010 22:21:57 Pavlos Parissis wrote:
> >> On 9 December 2010 17:09, Igor Chudov <[email protected]> wrote:
> >> > On Thu, Dec 9, 2010 at 9:31 AM, Dimitri Maziuk <[email protected]>
> > wrote:
> >> >> See "LRM operation WebSite_start_0 unknown error" from November, that's
> >> >> where your pdf led me. By the time I hit "unknown error" starting drbd
> >> >> resource -- set up exactly as you describe, I've spent close to a week
> >> >> trying to replicate the setup that takes < an hour.
> >> >
> >> > Sadly, I had a similar experience.
> >>
> >> Well, I didn't have that experience.
> >> I managed to set up a 3-node cluster with 2 DRBD resource and 2
> >> resource groups which have several resources by following the doc that
> >> is available on pacemaker and drbd.org sites.
> >> Yes, I face few configurations issues at the begging but
> >> pacemaker/linux-ha/drbd lists gave me enough support to continue.
> >>
> >> I come from SUN Clusters (3.1 back in 2003) and Redhat Cluster, and I
> >> have to say that pacemaker is far better and has much better
> >> functionality.
> >> There are things I don't like either, log messages to difficult to
> >> parse times and few other things.
> >>
> >> Last but not least, Cluster systems are not easy by definition and you
> >> can't expect to follow a wizard and hit next next and get the cluster
> >> up and running without understanding how it works.
> >>
> >> My 2 cents,
> >> Pavlos
> >> _______________________________________________
> >> Linux-HA mailing list
> >> [email protected]
> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >> See also: http://linux-ha.org/ReportingProblems
> >
> > I mostly concur on this.
> > It all has a steep learning curve, log files are far from transparent
> 
> Hard to disagree.
> There are primarily two problems with the logs:
> 
> 1) They're too verbose, which means its easy for the relevant
> information to be lost in the noise.
>     On the flip-side, it means bugs can be fixed the first time they
> occur and don't need to be reproducible.
>     This was a big advantage early on, but now that the system is
> generally mature we've slowly been trying to cut back on the amount we
> log.

It would be great to reduce severity to debug for some messages.
In particular in the PE, since we can anyway reproduce the
transition from PE input files. Obviously, picking the right
messages would take time.

> 2) Many of the "errors" originate in the RAs and they don't always do
> a good job of logging them

As far as I can say, they usually do. The problem is that the
ratio of that logging to what the subsystems produce is such that
they get lost.

>     All Pacemaker gets is a return code, so "unknown error" is often
> the only information it has.

Right. And the answer is always in the logs. It would certainly
be helpful to see immediately the RA message in the upper layers,
but for that we need better infrastructure. Right now the only
information we get from the RA is the exit code. I'm not sure how
difficult it would be to propagate a message along with it.

Thanks,

Dejan
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to