[Linux-ha-dev] Re: Fence agents converge

Lars Marowsky-Bree Mon, 06 Oct 2008 14:12:48 -0700

On 2008-10-06T21:30:41, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote:

> > - Primary uses C plugins, which are compiled to .so. All is based on PILS 
> > (A Generalized 
> > Plugin and Interface Loading System).
> >   * Primary idea, why to use plugins is low memory (.so is loaded when HB 
> > starting and is
> >   still in memory).
> >   * Of course every external plugin can fall
> Don't understand the last item. Can you elaborate please.


He is refering to failure in low-memory situations. fork() might not be
possible then, and memory needs to be pre-allocated - this is one of the
primary reasons why stonithd loads C plugins and instantiates them prior
to actually using them.

But it turns out that C code seems to be mostly too hard for people to
write. What I think I'd like to see is the fencing agents instead being
python classes, with a python fencing daemon too. Sure, we'd be
mlock()ing a few more megabytes, but who cares for the gained
simplicity, and that more of our agents would indeed benefit from that.
Right now, a significant portion causes forks, which really is besides
the point.

And people write realtime Java/mono code. Compared to that, a SCHED_FIFO
+ mlock'ed python is pure sanity ;-)

Said python classes would have the on off device_status outlet_status
start metadata ... functions and be passed a dictionary of configuration
values.

RHCS right now only uses one-shot operation, of course, but so the
fence_tool command would create the object and immediately exit again -
no harm done.

Yes, we'd be enforcing a single scripting language. (And not even one I
personally like much.) But I think it'd be worth it.

For the regression testing which honzaf wanted to write, said classes
would simply only be allowed to interact with the external world through
telnet/ssh/snmp/... input/output abstraction, which would allow us to
easily record and replay during unit tests.

> > This is used for GUI????
> The metadata? Don't know about the GUI, but in general it's
> underused.

The metadata is only used by the GUI right now. And it could do from
being in a format more closely resembling the OCF RA metadata, too. ;-)

> > * add getinfo-*???????? Is this really needed?
> I think that this would be a good place to keep the documentation
> about the device (in particular which devices are supported) and
> the plugin (possible problems, requirements). That way it is
> easily accessible to both the users (important!) and the
> developers.

Sure, the meta-data needs to be kept.

> > - HB 
> > * add status of light-out
> This could be nice to have for informational purposes.

We actually need it for the abilityt find out whether nodes are expected
to be up or down; this is important for nodes which we turn on and off
as needed for power management reasons, a feature I'd like to see in the
future.

> The device status is important, because that way we can alert the
> user in case of problems before fencing is needed. How was RHCS
> dealing with that? Perhaps indirectly by requesting the host
> power state?

It wasn't used in the past. RHCS didn't monitor the fencing devices.

> The power status may perhaps be of use to the cluster/resource
> manager. CRM doesn't know anything about it. Don't know about
> RHCS.

Yes, I'd like to eventually see it used in Pacemaker.

> Anything on resource-based fencing? There's a brocade fence agent
> in RHCS. Is it used also as node-level device?

Yes, it's only used for node-level fencing. Resource-based fencing needs
to happen as part of the resource hierarchy - via scsi3 reservations,
sfex and so on agents, doesn't need a special daemon like fenced/stonith
to manage.


Regards,
    Lars

-- 
Teamlead Kernel, SuSE Labs, Research and Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

[Linux-ha-dev] Re: Fence agents converge

Reply via email to