Re: [Linux-ha-dev] Re: Fence agents converge

Dejan Muhamedagic Tue, 07 Oct 2008 08:29:20 -0700

On Mon, Oct 06, 2008 at 11:12:20PM +0200, Lars Marowsky-Bree wrote:
> On 2008-10-06T21:30:41, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote:
> 
> > > - Primary uses C plugins, which are compiled to .so. All is based on PILS 
> > > (A Generalized 
> > > Plugin and Interface Loading System).
> > >   * Primary idea, why to use plugins is low memory (.so is loaded when HB 
> > > starting and is
> > >   still in memory).
> > >   * Of course every external plugin can fall
> > Don't understand the last item. Can you elaborate please.
> 
> He is refering to failure in low-memory situations. fork() might not be
> possible then, and memory needs to be pre-allocated - this is one of the
> primary reasons why stonithd loads C plugins and instantiates them prior
> to actually using them.
> 
> But it turns out that C code seems to be mostly too hard for people to
> write.


This is true, but I'm not sure how relevant. Interface to
external stonith plugins has been available for 3-4 years and so
far there were three contributed plugins (two in /bin/sh, one in
python).

> What I think I'd like to see is the fencing agents instead being
> python classes, with a python fencing daemon too. Sure, we'd be
> mlock()ing a few more megabytes, but who cares for the gained
> simplicity, and that more of our agents would indeed benefit from that.
> Right now, a significant portion causes forks, which really is besides
> the point.
> 
> And people write realtime Java/mono code.

Good luck with that.

> Compared to that, a SCHED_FIFO
> + mlock'ed python is pure sanity ;-)
> 
> Said python classes would have the on off device_status outlet_status
> start metadata ... functions and be passed a dictionary of configuration
> values.
> 
> RHCS right now only uses one-shot operation, of course, but so the
> fence_tool command would create the object and immediately exit again -
> no harm done.
> 
> Yes, we'd be enforcing a single scripting language. (And not even one I
> personally like much.) But I think it'd be worth it.

This is somewhat contradictory to the argument that people can't
contribute because they find C intimidating or too hard. I'm sure
there are some who find python awkward, so I don't see how
imposing python could help.

> For the regression testing which honzaf wanted to write, said classes
> would simply only be allowed to interact with the external world through
> telnet/ssh/snmp/... input/output abstraction, which would allow us to
> easily record and replay during unit tests.

About fencing and mlock: I've often wondered how much is this
relevant in today's computing. Can't recall any incident of the
kind, i.e. that the host to fence another one was so short on
memory that the fencing operation failed. Typically, such a host
has to take over some heavy resources right after fencing (rdbms,
web server), that'd surely make a hundred-fold bigger memory
demands.

It is also debatable what demands more memory: a python (think
garbage collection) instantiating objects or a process doing a
fork. If I were to place a bet, I think I'd go with the former.
Also, which of the two would you consider more predictable?

Looks like you've already discussed this matter in Prague.
I think I need some time to process it ;-)

> > > This is used for GUI????
> > The metadata? Don't know about the GUI, but in general it's
> > underused.
> 
> The metadata is only used by the GUI right now. And it could do from
> being in a format more closely resembling the OCF RA metadata, too. ;-)

It most probably needs to be revised.

> > > * add getinfo-*???????? Is this really needed?
> > I think that this would be a good place to keep the documentation
> > about the device (in particular which devices are supported) and
> > the plugin (possible problems, requirements). That way it is
> > easily accessible to both the users (important!) and the
> > developers.
> 
> Sure, the meta-data needs to be kept.
> 
> > > - HB 
> > > * add status of light-out
> > This could be nice to have for informational purposes.
> 
> We actually need it for the abilityt find out whether nodes are expected
> to be up or down;

How does this depend on the node having power (apart from the
obvious)?

> this is important for nodes which we turn on and off
> as needed for power management reasons, a feature I'd like to see in the
> future.

Yes, this is probably going to be important.

> > The device status is important, because that way we can alert the
> > user in case of problems before fencing is needed. How was RHCS
> > dealing with that? Perhaps indirectly by requesting the host
> > power state?
> 
> It wasn't used in the past. RHCS didn't monitor the fencing devices.
> 
> > The power status may perhaps be of use to the cluster/resource
> > manager. CRM doesn't know anything about it. Don't know about
> > RHCS.
> 
> Yes, I'd like to eventually see it used in Pacemaker.
> 
> > Anything on resource-based fencing? There's a brocade fence agent
> > in RHCS. Is it used also as node-level device?
> 
> Yes, it's only used for node-level fencing. Resource-based fencing needs
> to happen as part of the resource hierarchy - via scsi3 reservations,
> sfex and so on agents, doesn't need a special daemon like fenced/stonith
> to manage.

Brocade may be considered a resource level. I was just wondering
how RHCS users worked with it.

Thanks,

Dejan

> 
> Regards,
>     Lars
> 
> -- 
> Teamlead Kernel, SuSE Labs, Research and Development
> SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG N?rnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
> 
> _______________________________________________________
> Linux-HA-Dev: [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] Re: Fence agents converge

Reply via email to