On 8/18/2010 at 09:03 AM, Simon Horman <[email protected]> wrote: > On Tue, Aug 17, 2010 at 03:06:45PM +0200, Dejan Muhamedagic wrote: > > Hi, > > > > On Tue, Aug 17, 2010 at 04:50:27PM +0900, Simon Horman wrote: > > > On Wed, Jul 21, 2010 at 01:41:09AM -0600, Tim Serong wrote: > > > > Hi All, > > > > > > > > A while ago (April, from memory), there was an ABI change in > > > > clplumbing in cluster-glue. Presumably this went mostly unnoticed > > > > in general usage, however I have twice seen systems where the cluster > > > > could not run because of a missing (or incorrect) libglue2 package. > > > > One was my development system, with a dodgy build, the other was > > > > mentioned on #linux-ha yesterday, and was the result of ignoring a > > > > conflict error when installing the pacemaker RPM on openSUSE. So, > > > > let me be clear, this is not something anyone should need to worry > > > > about... But I thought I'd mention it here, because the error > > > > messages you get are, IMO, not very obvious. > > > > > > > > Symptoms of a mismatched pacemaker/libglue build are errors like: > > > > > > > > lrmd: [3004]: ERROR: > > > > main: can not create wait connection for command. > > > > lrmd: [3004]: ERROR: > > > > Startup aborted (can't create comm channel). Shutting down. > > > > ... > > > > pengine: [4011]: ERROR: > > > > init_client_ipc_comms_nodispatch: Could not access channel on: > > > > /var/run/crm/pengine > > > > corosync[4000]: [pcmk ] ERROR: > > > > pcmk_wait_dispatch: Child process pengine exited (pid=4011, rc=1) > > > > corosync[4000]: [pcmk ] notice: > > > > pcmk_wait_dispatch: Respawning failed child process: pengine > > > > > > > > If your cluster won't start and you see this in /var/log/messages, > > > > make sure libglue2 is up to date. And now that I've mentioned this > > > > here and it's made it to the mailing list archive, Google will know, > > > > and nobody else will ever have this problem again. > > > > > > > > This has been a public service announcement. Thank you for reading. > > > > > > Could we get the .so bumped accordingly in the next release of > > > cluster glue? That would at least help in managing the problem > > > once the new release has been made. > > > > I don't think that that is necessary. The ABI change in the > > _released_ cluster-glue packages was done in such a way as not to > > disturb the existing pacemaker installations, i.e. by adding > > fields to the end of the struct. Further, the library version has > > been bumped to 3:0:1 (with libtool's -version-info) at the time. > > For whatever reason that translates to so.2.1.0. Users of the new > > ABI are also using domain sockets of the new type if they want > > the new functionality. > > > > I guess that what Tim was seeing was Pacemaker built against the > > unreleased glue versions which did have different ABI, i.e. the > > fields were inserted somewhere in the middle of the struct. > > Ok, so no ABI incompatibility was introduced in 1.0.6. Great! > I will go ahead and close the related Debian bugs, > #593319, #593321, #593322 and #593323.
I was seeing Pacemaker *built* against new glue, installed on a system that had *old* glue installed, because both libglue2 (new glue) and libheartbeat2 < 3.0 (old glue) provide what looks like the same DSO; so when Pacemaker was upgraded on this system, libheartbeat2 was not automatically upgraded to libglue2. For reference, there's an openSUSE 11.3 bug for this: https://bugzilla.novell.com/show_bug.cgi?id=628243 I believe this may only be a problem on openSUSE 11.3, where heartbeat 2.99.3 still exists, providing old libheartbeat2. It shouldn't be a problem the other way around (i.e. old Pacemaker is meant to work with new glue, as Dejan said). Regards, Tim -- Tim Serong <[email protected]> Senior Clustering Engineer, OPS Engineering, Novell Inc. _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
