Hi All,

A while ago (April, from memory), there was an ABI change in
clplumbing in cluster-glue.  Presumably this went mostly unnoticed
in general usage, however I have twice seen systems where the cluster
could not run because of a missing (or incorrect) libglue2 package.
One was my development system, with a dodgy build, the other was
mentioned on #linux-ha yesterday, and was the result of ignoring a
conflict error when installing the pacemaker RPM on openSUSE.  So,
let me be clear, this is not something anyone should need to worry
about...  But I thought I'd mention it here, because the error
messages you get are, IMO, not very obvious.

Symptoms of a mismatched pacemaker/libglue build are errors like:

  lrmd: [3004]: ERROR:
    main: can not create wait connection for command.
  lrmd: [3004]: ERROR:
    Startup aborted (can't create comm channel).  Shutting down.
  ...
  pengine: [4011]: ERROR:
    init_client_ipc_comms_nodispatch: Could not access channel on:
    /var/run/crm/pengine
  corosync[4000]: [pcmk  ] ERROR:
    pcmk_wait_dispatch: Child process pengine exited (pid=4011, rc=1)
  corosync[4000]: [pcmk  ] notice:
    pcmk_wait_dispatch: Respawning failed child process: pengine

If your cluster won't start and you see this in /var/log/messages,
make sure libglue2 is up to date.  And now that I've mentioned this
here and it's made it to the mailing list archive, Google will know,
and nobody else will ever have this problem again.

This has been a public service announcement.  Thank you for reading.

Tim


-- 
Tim Serong <tser...@novell.com>
Senior Clustering Engineer, OPS Engineering, Novell Inc.




_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

Reply via email to