I'm trying to get lrmd and lrmadmin running under Solaris. Some parts are OK. But some parts are not working: from some lrmadmin requests, the result (probably failure) is not getting back to lrmadmin, which then waits forever even though lrmd has internally produced a result.
I don't know whether this is a Solaris issue, or a more general one to do with the type of request I'm making. But it doesn't feel right. (I guess part of it is Solaris-related, because I'm getting hangs in "STONITHDBasicSanityCheck" and "LRMBasicSanityCheck", and I guess that they're OK on Linux.) 1. Is the lrmd-lrmadmin communication channel documented somewhere (to tell us what ought to happen)? 2. In "on_op_done()", something feels strange about "need_notify". Indeed Alan's comments at version 1.125 (April 2005) reinforce my uneasiness. If this is (at heart) a "call and response" thing, then shouldn't notification (response from the server) be attempted every time? (In the case I'm trying, I've inserted some logging statements, and it is following a path which keeps "need_notify" false, even though "lradmin" is sitting there, awaiting an answer.) Overall, lrmd has that feel of having grown through accretion, and of now needing a bit of a spring-clean (as Alan's comments suggest). It is some 4,500 lines long... which feels rather excessive. How would one set about trying to apply some maintenance to it? (Back to the documentation question!) -- : David Lee I.T. Service : : Senior Systems Programmer Computer Centre : : Durham University : : http://www.dur.ac.uk/t.d.lee/ South Road : : Durham DH1 3LE : : Phone: +44 191 334 2752 U.K. : _______________________________________________________ Linux-HA-Dev: [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
