On 8/10/06, David Lee <[EMAIL PROTECTED]> wrote:
This is intended merely as a set of random thoughts. No criticism is
intended! Honestly.
1. Recent emails to the "-dev" list have identified various problems in
the code, and Lars M-B has suggested on a few occasions that janitorial
work on the code would be welcome.
2. My own minor dabblings at various places across the code (mainly to try
to pin down Solaris-driven problems and to try to rectify them in an
OS-independent manner) have identified various places where the code makes
assumptions (unintentional, I'm sure) about the environment (e.g. sh/bash)
and is inconsistent in naming relocatable directories ("HA_VAR*" etc.).
3. Bugs lurk on non-Linux implementations. We still don't pass BSC on
non-Linux systems (Solaris, BSD). Why? What features of the coding cause
these failures?
BSD *used* to be ok until sunjd made the stonithd require farside_pid
there is a bugzilla entry that is currently being ignored
4. Debugging for an outsider (such as me) is a headache: for one thing the
internal documentation and commenting is lacking in too many places.
5. A colleague here (Andrew Stribblehill) and I have long had a gut
feeling (instinct, etc.) that the overall code base just feels too big.
Well there are two resource managers for one thing.
But in terms of the CRM... you'd be hard pressed to delete anything
thats not already surrounded by backwards compatibility #ifdef's.
(Put another way: if the supplier were Microsoft, the word "bloat" would
be surfacing!) There's a vast amount of code: but what does it all
actually, really, ultimately do?
Since 1.x, the project roughly doubled in size (about 80-90k new LOC,
of which about 60 make up the CRM) to 240-ish LOC.
But we're also no longer restricted to 2 nodes, our resource model can
better model environments and we have a bunch of other capabilities.
Easy to put in a list, but they require significantly more code to
implement.
And does it do it in a "best practice"
manner or only in a "get the job done" manner?
I think its fair to say that we made mistakes in the past with the
2-node resource manager. Obviously we dont want to get into that
situation again.
This time we have gone out of our way to decouple the various pieces
and choose algorithms that dont create artificial limitations. So
from a design point-of-view, I think we're rock solid - if only
because we can swap out anything that demonstrably sucks.
So the new code you're seeing is not just a few new features, but
something that we can carry forward into 2.2 and hopefully 3.x
There are a couple of implementations that could probably do with some
work (I think my views on this are well known so I wont repeat them)
and might be candidates for a rewrite. The TEngine and CIB are
already at v2 and the PEngine has one scheduled that incorporates the
lessons learnt so far.
As a side note, I have considered using an OO-language for the PEngine
but that would also require the pieces that use it to also be in that
language (which is a few).
6. Et cetera.
Heartbeat started small, with various ideas. It has grown; the ideas and
visions have changed. Time has moved on and the typical OS environment
has widened.
So (deep breath! ...) once 2.0.7 is shipped, might it be time to pause and
take stock? We would continue the background bug-fixing as always, but
for a while would put a hold on new developments, and instead do some sort
of audit and rationalising of the code and its documentation. (Yes, I did
present that as a (mostly) "either/or" thing because realistically I think
that is what we would need to do.)
The sad reality is that for all the people using Heartbeat, the set of
active developers (and thus people that are in a position do
either/or) has dwindled to the point where we cant keep up.
My view is that both sides of the either/or require more resources
than we have available.
Just a few examples that spring to mind (at various different levels):
1. IPaddr and IPaddr2: We shouldn't really be offering the end-user two
options. We ought to be able to offer one product which "does the right
thing".
agreed
2. "HA_VAR*" inconsistencies: Sweep through and tidy these up. Includes
removal of "-DHA_*=*" from all "Makefile.am", all of which can now be
removed (with a few essential renames in various ".h" and ".c" files).
no objection here
3. (Another deep breath...) Is "python" now an essential pre-requisite for
heartbeat? If so, should various things (e.g. "lrmd" and friends,
"stonithd" and friends) be rewritten into python?
it depends on your definition of "friends" :-)
though really i wouldnt see either of those as a good fit for python
They would almost
certainly become significantly smaller and thus cleaner and easier to
understand, to maintain and to document. The very act of such a rewrite
(from one language type to a very different type) would itself result in
tidying by concentrating the mind on intended functionality.
4. By doing such exercises, it might make future developments actually
easier (short-term pain; long-term gain). For instance, writing a module
or daemon in "python" (or into a python framework) might be easier than
doing it in C, especially given the large amount of string handling that
often has to be done.
I'm not sure I understand the interest in python
5. Insert your own pet "I wish {X|Y|Z} were {tidier|maintainable}" here.
extracting the "old" resource manager from the core heartbeat code
would be on my list
6. Et cetera.
Might this suggest a transition from "2.0.x" to (say) "2.2.x"? The 2.2.0
branch would be functionally equivalent to 2.0.7, but tidied as above.
My personal belief is that we're still a fair way from a 2.2. Maybe a year.
2.0.x is only now available in SLES and I would expect a bump in its
userbase. Embarking on another stable release without the resulting
feedback does not seem wise to me.
Doubtless such work feels tedious and uninviting for development-minded
folk. But I still think we ought at least to entertain the thought, even
if only to the extent of generating defensible reasons not to do it.
as above
As I say, simply "blue sky wish-list", and probably with rose-tinted
spectacles. (Might "blue sky" through "rose tint" get a bit dark??)
Let me repeat that this is not intended as criticism. Please, please
don't take it as such.
OK. Give me a couple of minutes to don a full fire-retardant suit...
at least you care enough to propose such madness in the first place ;-)
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/