On Wed, 28 Jun 2006, Sun Jiang Dong wrote:

> David Lee wrote:
> > I'm trying to get lrmd and lrmadmin running under Solaris.  Some parts are
> > OK.  But some parts are not working: from some lrmadmin requests, the
> > result (probably failure) is not getting back to lrmadmin, which then
> > waits forever even though lrmd has internally produced a result.
> Can you tell more details? For example, which parts is ok or not.
>
> And what (compiling) environment you used? Forte or Gcc?  Solaris x86 or 
> sparc?

gcc (3.4.5); Solaris-9/sparc

>
> > [...]
> > 1. Is the lrmd-lrmadmin communication channel documented somewhere (to
> > tell us what ought to happen)?
> I'm not sure what information you need. Actually they use unix-socket and
> corresponding files are /var/run/heartbeat/lrm_sock and
> /var/run/heartbeat/lrm_callback_sock.

I was wondering whether the protocol is described somewhere (in similar
manner to RFCs describing SMTP, TCP, etc.).  Alternatively, whether the
expected (designed) internal operation of lrmd was documented.

>
> > 2. In "on_op_done()", something feels strange about "need_notify".
> > Indeed Alan's comments at version 1.125 (April 2005) reinforce my
> > uneasiness.  If this is (at heart) a "call and response" thing, then
> > shouldn't notification (response from the server) be attempted every time?
> > (In the case I'm trying, I've inserted some logging statements, and it is
> > following a path which keeps "need_notify" false, even though "lradmin" is
> > sitting there, awaiting an answer.)
> Likely you are tracking for the repeating operations, so need_notify keeps
> false. I think it's not related to the issue on Solaris.

The hanging "lrmadmin" was from "STONITHDBasicSanityCheck" where it does
the "lrmadmin -E s1 start 0 0 0".

Note that on a different subthread of this topic, Alan has suggested that
I open a bugzilla report.  (And Matt Soffen, another heartbeat developer
using non-Linux, has also just reported a problem.)

So a little documentation about how it is _designed_ to work would help
folk like Matt and me chase problems.

Best wishes.

-- 

:  David Lee                                I.T. Service          :
:  Senior Systems Programmer                Computer Centre       :
:                                           Durham University     :
:  http://www.dur.ac.uk/t.d.lee/            South Road            :
:                                           Durham DH1 3LE        :
:  Phone: +44 191 334 2752                  U.K.                  :
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to