On Wed, Aug 18, 2010 at 09:39:59PM +0900, Simon Horman wrote:
> On Wed, Aug 18, 2010 at 10:18:09AM +0200, Dejan Muhamedagic wrote:
> > Hi,
> > 
> > On Wed, Aug 18, 2010 at 10:46:41AM +0900, Simon Horman wrote:
> > > On Tue, Aug 17, 2010 at 07:21:40PM -0600, Tim Serong wrote:
> > > > On 8/18/2010 at 10:25 AM, Simon Horman <[email protected]> wrote: 
> > > > > On Tue, Aug 17, 2010 at 06:12:04PM -0600, Tim Serong wrote: 
> > > > > > On 8/18/2010 at 09:03 AM, Simon Horman <[email protected]> wrote:  
> > > > > > > On Tue, Aug 17, 2010 at 03:06:45PM +0200, Dejan Muhamedagic 
> > > > > > > wrote:  
> > > > > > > > Hi,  
> > > > > > > >   
> > > > > > > > On Tue, Aug 17, 2010 at 04:50:27PM +0900, Simon Horman wrote:  
> > > > > > > > > On Wed, Jul 21, 2010 at 01:41:09AM -0600, Tim Serong wrote:  
> > > > > > > > > > Hi All,  
> > > > > > > > > >   
> > > > > > > > > > A while ago (April, from memory), there was an ABI change 
> > > > > > > > > > in  
> > > > > > > > > > clplumbing in cluster-glue.  Presumably this went mostly 
> > > > > > > > > > unnoticed  
> > > > > > > > > > in general usage, however I have twice seen systems where 
> > > > > > > > > > the cluster  
> > > > > > > > > > could not run because of a missing (or incorrect) libglue2 
> > > > > > > > > > package.  
> > > > > > > > > > One was my development system, with a dodgy build, the 
> > > > > > > > > > other was  
> > > > > > > > > > mentioned on #linux-ha yesterday, and was the result of 
> > > > > > > > > > ignoring a  
> > > > > > > > > > conflict error when installing the pacemaker RPM on 
> > > > > > > > > > openSUSE.  So,  
> > > > > > > > > > let me be clear, this is not something anyone should need 
> > > > > > > > > > to worry  
> > > > > > > > > > about...  But I thought I'd mention it here, because the 
> > > > > > > > > > error  
> > > > > > > > > > messages you get are, IMO, not very obvious.  
> > > > > > > > > >   
> > > > > > > > > > Symptoms of a mismatched pacemaker/libglue build are errors 
> > > > > > > > > > like:  
> > > > > > > > > >   
> > > > > > > > > >   lrmd: [3004]: ERROR:  
> > > > > > > > > >     main: can not create wait connection for command.  
> > > > > > > > > >   lrmd: [3004]: ERROR:  
> > > > > > > > > >     Startup aborted (can't create comm channel).  Shutting 
> > > > > > > > > > down.  
> > > > > > > > > >   ...  
> > > > > > > > > >   pengine: [4011]: ERROR:  
> > > > > > > > > >     init_client_ipc_comms_nodispatch: Could not access 
> > > > > > > > > > channel on:  
> > > > > > > > > >     /var/run/crm/pengine  
> > > > > > > > > >   corosync[4000]: [pcmk  ] ERROR:  
> > > > > > > > > >     pcmk_wait_dispatch: Child process pengine exited 
> > > > > > > > > > (pid=4011, rc=1)  
> > > > > > > > > >   corosync[4000]: [pcmk  ] notice:  
> > > > > > > > > >     pcmk_wait_dispatch: Respawning failed child process: 
> > > > > > > > > > pengine  
> > > > > > > > > >   
> > > > > > > > > > If your cluster won't start and you see this in 
> > > > > > > > > > /var/log/messages,  
> > > > > > > > > > make sure libglue2 is up to date.  And now that I've 
> > > > > > > > > > mentioned this  
> > > > > > > > > > here and it's made it to the mailing list archive, Google 
> > > > > > > > > > will know,  
> > > > > > > > > > and nobody else will ever have this problem again.  
> > > > > > > > > >   
> > > > > > > > > > This has been a public service announcement.  Thank you for 
> > > > > > > > > > reading.  
> > > > > > > > >   
> > > > > > > > > Could we get the .so bumped accordingly in the next release 
> > > > > > > > > of  
> > > > > > > > > cluster glue? That would at least help in managing the 
> > > > > > > > > problem  
> > > > > > > > > once the new release has been made.  
> > > > > > > >   
> > > > > > > > I don't think that that is necessary. The ABI change in the  
> > > > > > > > _released_ cluster-glue packages was done in such a way as not 
> > > > > > > > to  
> > > > > > > > disturb the existing pacemaker installations, i.e. by adding  
> > > > > > > > fields to the end of the struct. Further, the library version 
> > > > > > > > has  
> > > > > > > > been bumped to 3:0:1 (with libtool's -version-info) at the 
> > > > > > > > time.  
> > > > > > > > For whatever reason that translates to so.2.1.0. Users of the 
> > > > > > > > new  
> > > > > > > > ABI are also using domain sockets of the new type if they want  
> > > > > > > > the new functionality.  
> > > > > > > >   
> > > > > > > > I guess that what Tim was seeing was Pacemaker built against 
> > > > > > > > the  
> > > > > > > > unreleased glue versions which did have different ABI, i.e. the 
> > > > > > > >  
> > > > > > > > fields were inserted somewhere in the middle of the struct.  
> > > > > > >   
> > > > > > > Ok, so no ABI incompatibility was introduced in 1.0.6. Great!  
> > > > > > > I will go ahead and close the related Debian bugs,  
> > > > > > > #593319, #593321, #593322 and #593323.  
> > > > > >  
> > > > > > I was seeing Pacemaker *built* against new glue, installed on a 
> > > > > > system 
> > > > > > that had *old* glue installed, because both libglue2 (new glue) and 
> > > > > > libheartbeat2 < 3.0 (old glue) provide what looks like the same 
> > > > > > DSO; 
> > > > > > so when Pacemaker was upgraded on this system, libheartbeat2 was 
> > > > > > not 
> > > > > > automatically upgraded to libglue2.  For reference, there's an 
> > > > > > openSUSE 11.3 bug for this: 
> > > > > >  
> > > > > >   https://bugzilla.novell.com/show_bug.cgi?id=628243 
> > > > > >  
> > > > > > I believe this may only be a problem on openSUSE 11.3, where 
> > > > > > heartbeat 
> > > > > > 2.99.3 still exists, providing old libheartbeat2. 
> > > > > >  
> > > > > > It shouldn't be a problem the other way around (i.e. old Pacemaker 
> > > > > > is 
> > > > > > meant to work with new glue, as Dejan said). 
> > > > >  
> > > > > Understood. 
> > > > >  
> > > > > Was the new glue that you used for building a released version 
> > > > > or an hg snapshot? 
> > > > 
> > > > The first time I saw it was on with an odd build around about the time
> > > > of glue 1.0.4 or 1.0.5 (with which there was definitely a problem,
> > > > see http://www.gossamer-threads.com/lists/linuxha/dev/63396). 
> > > > 
> > > > The issue on openSUSE 11.3 is with Pacemaker built against glue slightly
> > > > newer than 1.0.5 (changeset 1448deafdf79), but installed with 
> > > > libheartbeat2
> > > > 2.99.x instead of libglue2.
> > > > 
> > > > I have not tried Pacemaker built against glue 1.0.5, but installed with
> > > > an earlier glue (e.g. 1.0.4 or earlier).  I expect this would break in 
> > > > the
> > > > same way I mentioned originally.
> > > > 
> > > > I had a quick look at the Debian bugs you mentioned.  If it's possible 
> > > > at
> > > > all on Debian to have glue < 1.0.5 installed with Pacemaker built 
> > > > against
> > > > glue >= 1.0.5, I expect there will be trouble.  However, a quick search
> > > > on packages.debian.org shows no glue earlier than 1.0.5, so hopefully
> > > > this means you're good.
> > > 
> > > Hi Tim,
> > > 
> > > If we disregard unstable, which I think is reasonable, and look at 
> > > testing,
> > > then the only versions of cluster-glue that have ever existed in Debian 
> > > are
> > > 1.0.5-2 and 1.0.6-1 [1]. So it sounds like we should be ok.
> > 
> > Looking again at the whole matter, it is possible to run into
> > problems if one installs a pacemaker built against a new glue
> > release (>=1.0.5), but tries to run it with some older glue
> > release (<1.0.5). The reason is here (from
> > include/clplumbing/ipc.h):
> > 
> > /* Unix domain socket with farside uid + gid credentials.
> >  * Available since libplumb.so.2.1.0 */
> > #define IPC_UDS_CRED        "uds_c"
> > 
> > #ifdef IPC_UDS_CRED
> > #   define  IPC_ANYTYPE     IPC_UDS_CRED
> > #else
> > #   error "No IPC types defined(!)"
> > #endif
> > 
> > uds_c didn't exist before. Before, IPC_ANYTYPE was defined to be
> > IPC_DOMAIN_SOCKET ("uds"). Must say that I don't know why that
> > changed. Users who needed "uds_c" should've asked for it
> > explicitely.
> 
> In that case could I re-request that the so be bumped
> so we get libplumb.so.3.0.0 ?

It has already been bumped to 2.1.0. Can't recall which version
it was before, perhaps 2.0.0. At any rate, if you deal only with
packages >= 1.0.5 you should be ok.

Cheers,

Dejan


> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to