Found the problem. The include file lib/llist.h is not compliant to what the
newest libe package (from 2004) is doing. It looks like ganglia has changed
the _llist_entry structure (actually only changed the order of member
elements). This means that gexec cannot work together with ganglia-3.0.3.
Since ganglia was the one who changed the structure, maybe somebody remembers
why this happened. If there was a good reason to do this, we should try to
change libe. Otherwise we should probably revert the change in the ganglia
tree.
Regards,
Erich
On Thursday 23 November 2006 10:59, Erich Focht wrote:
> On Thursday 23 November 2006 03:24, michael chang wrote:
> > What OS and compiler were used?
>
> CentOS4.2 i386, gcc 3.4.4, glibc 2.3.4.
>
> Regards,
> Erich
>
> > On 11/22/06, Erich Focht <[EMAIL PROTECTED]> wrote:
> > > Hi,
> > >
> > > I'm trying to run gexec with ganglia-3.0.3. Built ganglia with
> > > --enable-gexec,
> > > built and installed gexec. gexec runs fine if executed standalone, but
> > > when
> > > I try it together with ganglia, gexec segfaults.
> > >
> > > Does anybody have gexec (version 0.3.6) running with ganglia-3.0.3? Did
> > > anything change with ganglia-3.x which could lead to trouble with gexec?
> > >
> > > gdb shows the problem is in gexec.c:219
> > >
> > > 214 lli = cluster.gexec_hosts;
> > > 215 for (i = 0; i < *nhosts; i++) {
> > > 216 e_assert(lli != NULL);
> > > 217 (*ips)[i] = (char *)xmalloc(IP_STRLEN);
> > > 218 host = (gexec_host_t *)lli->val;
> > > 219 e_assert(strlen(host->ip) < IP_STRLEN);
> > > 220 strcpy((*ips)[i], host->ip);
> > > 221 lli = lli->next;
> > > 222 }
> > >
> > > The host variable is NULL.
> > >
> > > Any ideas?
> > >
> > > Erich