On Tue, Aug 17, 2010 at 07:21:40PM -0600, Tim Serong wrote: > On 8/18/2010 at 10:25 AM, Simon Horman <[email protected]> wrote: > > On Tue, Aug 17, 2010 at 06:12:04PM -0600, Tim Serong wrote: > > > On 8/18/2010 at 09:03 AM, Simon Horman <[email protected]> wrote: > > > > On Tue, Aug 17, 2010 at 03:06:45PM +0200, Dejan Muhamedagic wrote: > > > > > Hi, > > > > > > > > > > On Tue, Aug 17, 2010 at 04:50:27PM +0900, Simon Horman wrote: > > > > > > On Wed, Jul 21, 2010 at 01:41:09AM -0600, Tim Serong wrote: > > > > > > > Hi All, > > > > > > > > > > > > > > A while ago (April, from memory), there was an ABI change in > > > > > > > clplumbing in cluster-glue. Presumably this went mostly > > > > > > > unnoticed > > > > > > > in general usage, however I have twice seen systems where the > > > > > > > cluster > > > > > > > could not run because of a missing (or incorrect) libglue2 > > > > > > > package. > > > > > > > One was my development system, with a dodgy build, the other was > > > > > > > mentioned on #linux-ha yesterday, and was the result of ignoring > > > > > > > a > > > > > > > conflict error when installing the pacemaker RPM on openSUSE. > > > > > > > So, > > > > > > > let me be clear, this is not something anyone should need to > > > > > > > worry > > > > > > > about... But I thought I'd mention it here, because the error > > > > > > > messages you get are, IMO, not very obvious. > > > > > > > > > > > > > > Symptoms of a mismatched pacemaker/libglue build are errors like: > > > > > > > > > > > > > > > > > > > > > lrmd: [3004]: ERROR: > > > > > > > main: can not create wait connection for command. > > > > > > > lrmd: [3004]: ERROR: > > > > > > > Startup aborted (can't create comm channel). Shutting down. > > > > > > > ... > > > > > > > pengine: [4011]: ERROR: > > > > > > > init_client_ipc_comms_nodispatch: Could not access channel > > > > > > > on: > > > > > > > /var/run/crm/pengine > > > > > > > corosync[4000]: [pcmk ] ERROR: > > > > > > > pcmk_wait_dispatch: Child process pengine exited (pid=4011, > > > > > > > rc=1) > > > > > > > corosync[4000]: [pcmk ] notice: > > > > > > > pcmk_wait_dispatch: Respawning failed child process: pengine > > > > > > > > > > > > > > If your cluster won't start and you see this in > > > > > > > /var/log/messages, > > > > > > > make sure libglue2 is up to date. And now that I've mentioned > > > > > > > this > > > > > > > here and it's made it to the mailing list archive, Google will > > > > > > > know, > > > > > > > and nobody else will ever have this problem again. > > > > > > > > > > > > > > This has been a public service announcement. Thank you for > > > > > > > reading. > > > > > > > > > > > > Could we get the .so bumped accordingly in the next release of > > > > > > cluster glue? That would at least help in managing the problem > > > > > > once the new release has been made. > > > > > > > > > > I don't think that that is necessary. The ABI change in the > > > > > _released_ cluster-glue packages was done in such a way as not to > > > > > disturb the existing pacemaker installations, i.e. by adding > > > > > fields to the end of the struct. Further, the library version has > > > > > been bumped to 3:0:1 (with libtool's -version-info) at the time. > > > > > For whatever reason that translates to so.2.1.0. Users of the new > > > > > ABI are also using domain sockets of the new type if they want > > > > > the new functionality. > > > > > > > > > > I guess that what Tim was seeing was Pacemaker built against the > > > > > unreleased glue versions which did have different ABI, i.e. the > > > > > fields were inserted somewhere in the middle of the struct. > > > > > > > > Ok, so no ABI incompatibility was introduced in 1.0.6. Great! > > > > I will go ahead and close the related Debian bugs, > > > > #593319, #593321, #593322 and #593323. > > > > > > I was seeing Pacemaker *built* against new glue, installed on a system > > > that had *old* glue installed, because both libglue2 (new glue) and > > > libheartbeat2 < 3.0 (old glue) provide what looks like the same DSO; > > > so when Pacemaker was upgraded on this system, libheartbeat2 was not > > > automatically upgraded to libglue2. For reference, there's an > > > openSUSE 11.3 bug for this: > > > > > > https://bugzilla.novell.com/show_bug.cgi?id=628243 > > > > > > I believe this may only be a problem on openSUSE 11.3, where heartbeat > > > 2.99.3 still exists, providing old libheartbeat2. > > > > > > It shouldn't be a problem the other way around (i.e. old Pacemaker is > > > meant to work with new glue, as Dejan said). > > > > Understood. > > > > Was the new glue that you used for building a released version > > or an hg snapshot? > > The first time I saw it was on with an odd build around about the time > of glue 1.0.4 or 1.0.5 (with which there was definitely a problem, > see http://www.gossamer-threads.com/lists/linuxha/dev/63396). > > The issue on openSUSE 11.3 is with Pacemaker built against glue slightly > newer than 1.0.5 (changeset 1448deafdf79), but installed with libheartbeat2 > 2.99.x instead of libglue2. > > I have not tried Pacemaker built against glue 1.0.5, but installed with > an earlier glue (e.g. 1.0.4 or earlier). I expect this would break in the > same way I mentioned originally. > > I had a quick look at the Debian bugs you mentioned. If it's possible at > all on Debian to have glue < 1.0.5 installed with Pacemaker built against > glue >= 1.0.5, I expect there will be trouble. However, a quick search > on packages.debian.org shows no glue earlier than 1.0.5, so hopefully > this means you're good.
Hi Tim, If we disregard unstable, which I think is reasonable, and look at testing, then the only versions of cluster-glue that have ever existed in Debian are 1.0.5-2 and 1.0.6-1 [1]. So it sounds like we should be ok. For the record, there has never been a release of Debian stable that included cluster-glue - it will appear in Squeeze for the first time. [1] http://packages.qa.debian.org/c/cluster-glue.html _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
