Hi,

On Wed, Aug 11, 2010 at 05:22:56PM -0700, David Lang wrote:
> On Thu, 12 Aug 2010, Dejan Muhamedagic wrote:
> 
> > On Wed, Aug 11, 2010 at 03:59:34PM -0700, David Lang wrote:
> >> On Thu, 12 Aug 2010, Dejan Muhamedagic wrote:
> >>
> >>> On Wed, Aug 11, 2010 at 02:44:36PM -0700, David Lang wrote:
> >> I currently manage over a hundred
> >> clusters of machines. with v1 style configs this is easy to integrate into 
> >> the
> >> other server management tools, if changes had to be done strictly via the 
> >> crm
> >> shell, this is much more complicated.
> >
> > Why would that be? If you do
> >
> > crm configure edit
> >
> > it will take you straight to the editor and the saved changes are
> > going to be applied to the cluster. There's also (in v1.1)
> >
> > crm configure filter
> >
> > which can be used with say sed. There's also a way to load/save
> > configurations from/to regular files.
> 
> if I can load/save configs from regular files, and the format of those files 
> is 
> documented so that I can edit them (either manually or programatically), that 
> is 
> roughly equivalent to having the configs in plain text (and at that point I 
> start to question why not just use the plain text version :-)

Well, because everybody thinks that XML is the holy grail for
configurations. True, one probably won't run into problems
extending it because it is possible to express almost anything in
it. It's just that for us humans it presents a tad of a problem
working with it.

> > As for the management, how do you make a node standby now?
> > hb_standby, right? How's that different from "crm node standby"?
> 
> that's not the type of thing that's a problem.
> 
> the type of thing that's a problem is changing the config.
> 
> sometimes I do this with vi, sometimes I do this with scripts to build the 
> haresources line from scratch, sometimes I use sed on haresources, etc. 
> having 
> to make the changes by interacting with a manu/gui is a major step backwards.

Yes, GUIs don't scale well.

> I can very much understand how a good menu/gui tool could make it easyier for 
> a 
> beginner to get started, or to explore the possibilities, but for widespread 
> production use the ability to have plain text files to manipulate is critical.

Right and we do have the shell to enable this kind of
management.

> >>>> This is really starting to sound like we need to fork heartbeat back to 
> >>>> the
> >>>> 2.x or thereabouts when it could work for simple things easily.
> >>>
> >>> I can understand the way you feel. But I don't think that there
> >>> is a need to maintain the Heartbeat v1 bits separately. With
> >>> Heartbeat 3.x you need to install in addition just the
> >>> cluster-glue package (perhaps named differently in various
> >>> distributions).
> >>
> >> what would that do? would it let us use v1 style configs where they are
> >> suffient?
> >
> > Yes. I doubt very much that v1 functionality got broken with the
> > split.
> 
> good, so to use the v1 functionality I need to install heartbeat + 
> cluster-glue 
> ?

Yes.

> >>>> does anyone have a good handle on where we should start and what bugs 
> >>>> have been
> >>>> fixed since then (as opposed to new features added, components split 
> >>>> out, etc)?
> >>>
> >>> The mercurial repository is the ultimate source.
> >>
> >> yes, that is the ultimate source, but it's far more painful to have to 
> >> start
> >> from scratch than if someone who is familar with the codebase can provide 
> >> a map.
> >
> > The heartbeat codebase as well as the libraries (clplumbing),
> > i.e. the parts which are relevant to v1, haven't changed much in
> > the last few years.
> 
> that's what I figured, and the reason I was asking the question.
> 
> >>>> I've been watching things get more and more complicated over time, and I
> >>>> recognise that to solve complex problems you sometimes need that 
> >>>> complexity, but
> >>>> there are a LOT of problems that aren't that complex. Heartbeat has been 
> >>>> making
> >>>> it harder and harder to do simple things, and with the difficulty in 
> >>>> figuring
> >>>> out what version 3.0.2 is doing that Igor is experiancing, and the 
> >>>> inability to
> >>>> take a simple config and convert it to the new format, it is sounding 
> >>>> like it
> >>>> may be time to fork.
> >>>
> >>> I completely agree that increased complexity is a problem and
> >>> particularly in HA solutions. And it is possible to create very
> >>> complex configurations with Pacemaker, and at the same time make
> >>> it hard (or impossible) for humans to understand what does the
> >>> cluster do.
> >>
> >> and sometimes such complexity is needed, but sometimes it's not.
> >
> > I'd say that running something one can't understand is at least
> > unmaintainable.
> 
> but if all I'm doing is the simple stuff, I don't need to understand all the 
> complex stuff, I just need to learn the part that I'm using.

Well, you said it. I'm not sure what does "complex stuff" exactly
refer to.

> >>> However, if you want to run a configuration comparable to v1,
> >>> i.e. a simple active-passive or active-active setup, a Pacemaker
> >>> cluster is quite manageable.  Right now it has all the tools to
> >>> make it much easier to manage than a haresources based cluster.
> >>> Once you give it a try, you probably won't look back.
> >>
> >> the problem is that the learning curve has been made so steep that even 
> >> people
> >> who are familar with clusters (and earlier versions of heartbeat) have 
> >> problems
> >> setting up these simple clusters.
> >
> > I hope that the situation got a bit better recently. One still
> > needs quite a bit of time to devote to learn it, but simple
> > clusters should really not be a problem anymore.
> >
> >> the fact that we are on day 2 or 3 of Igor's problem and can't even figure 
> >> out
> >> what's happening because the logs aren't showing anything is a very bad 
> >> sign.
> >
> > Those logs have always been the same.
> 
> Could you please take a look at what Igor has been posting and see if you can 
> figure out why the logs stop within a minute or so of heartbeat starting 
> (before 
> it starts/stops any resources) and doesn't log _anything_ for a long time (at 
> least 40 min)
> 
> the logs are not showing stuff that I (and others who have responded) are 
> used 
> to seeing in the 2.x versions that we have deployed, so I assumed that this 
> was 
> due to logging changes (I have never used logd, so I didn't know what changes 
> it 
> had for example)

Unfortunately, I forgot almost everything about v1 and can't
provide any useful input. Don't know what kind of logging is
missing.

> >> I really don't want to have heartbeat fork, but as the project has grown 
> >> new
> >> features and then split off the resource management stuff, the difficulty 
> >> in
> >> getting the simple things working has been growing.
> >>
> >> most of us who didn't need that complexity just ignored it as long as the
> >> haresources configs continued to work.
> >
> > And, for the time being, they should work. Don't know what will
> > the future bring, didn't notice much interest in supporting that.
> > Perhaps somebody from Linbit can comment too.
> >
> >> at this point it seems like either the haresources configs need to be
> >> un-depriciated and supported, or something else. but the current situation 
> >> is
> >> getting unreasonable.
> >
> > If there are enough shops interested in running v1, then somebody
> > will probably support it too.
> 
> I think there is, and that's why I started this thread. The ideal result 
> would 
> be to not fork, and have the v1 style configs supported in the latest version.
> 
> it's not that people want to run the v1 code, but the v1 configs are very 
> minimalist, and pretty easy to understand. If that satisfies your needs 
> (which 
> it does for a lot of people), going to all the added complication of the 
> newer 
> stuff is a lot of cost for very little return.
> 
> Yes, there are times when you need to run clusters of more than 2 machines, 
> need 
> to load balance, need to shift processes around to keep a lot of different 
> applications running on one cluster without overloading any one box, etc.
> 
> but most people start out with things running on a single machine, and then 
> need 
> to make that thing HA. going to a 2-machine cluster with simple failover is 
> all 
> they (initially) need. It's only after people run such clusters for while do 
> they start looking at larger clusters and more complex tasks.

I think that we need to clarify which part is actually the
complex one. Administration using cibadmin, crm_resource, and
other tools was somewhat, well, unfriendly. Things got much
better in the meantime. But there's still room for improvement.

Thanks,

Dejan

> David Lang
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to