On Sat, Oct 27, 2007 at 09:22:31PM -0400, Andrew Rowland wrote:
> Non-Working Node: x86 CLFS-1.0.0 with Linux 2.6.19.1 kernel.  Ganglia
> built as:
> 
>       ./configure --prefix=/opt/ganglia --disable-gexec && make && make
> install

so you are using gcc 4.1.1 and glibc 2.4 and nothing of interest to report
when compiled the included expat?

> > # gstat -a
> 
> 40 files(1.1M bytes) - /usr/src/sys-cluster/ganglia/ganglia-3.0.5
> [EMAIL PROTECTED] for 6D17h30m $ gstat -a
> CLUSTER INFORMATION
>        Name: clusterfsck
>       Hosts: 1

so the gmond for your x86 is dead even if it started.

> weibullone.weibullnet.net
>     2 (    0/  201) [  2.59,  2.41,  2.10] [   0.0,  62.1,   1.4,  36.5,
> 0.0] OFF

why the working box had a real FQDN name defined and the broken one has a fake
non standard one?, can you define a good hostname in the weibullnet.net domain
for the x86 and see if that helps?

> > > When I gstat -i 172.16.1.101 from the head node, I get the following and
> > > the gmond daemon is killed on 172.16.1.101.
> > > 
> > >   gexec_cluster() XML_ParseBuffer() error at line 51:
> > >   no element found
> > 
> > this means that gmond crashed because of a broken xml while using expat, can
> > you paste the output of 
> > 
> > # lsof -p `pidof gmond`
> 
> On the non-working node:
> 
> 55 files() - /home/users/weibullguy/lsof_4.78/lsof_4.78_src
> [EMAIL PROTECTED] for 0h16m $ ./lsof -p `pidof gmond`
> COMMAND   PID USER   FD   TYPE DEVICE    SIZE    NODE NAME
> gmond   16485 root  cwd    DIR    3,3    4096       2 /
> gmond   16485 root  rtd    DIR    3,3    4096       2 /
> gmond   16485 root  txt    REG    3,3  634148  182171 /usr/sbin/gmond
> gmond   16485 root  mem    REG    3,3  152573
> 1172777 /lib/libnss_files-2.4.so
> gmond   16485 root  mem    REG    3,3 6608752 1172766 /lib/libc-2.4.so
> gmond   16485 root  mem    REG    3,3  572389
> 1172773 /lib/libpthread-2.4.so
> gmond   16485 root  mem    REG    3,3  404817 1172783 /lib/libnsl-2.4.so
> gmond   16485 root  mem    REG    3,3  224864
> 1172774 /lib/libresolv-2.4.so
> gmond   16485 root  mem    REG    3,3   99796 1172770 /lib/libdl-2.4.so
> gmond   16485 root  mem    REG    3,3   54376
> 1172772 /lib/libcrypt-2.4.so
> gmond   16485 root  mem    REG    3,3  565268 1172769 /lib/libm-2.4.so
> gmond   16485 root  mem    REG    3,3  165063 1172778 /lib/librt-2.4.so
> gmond   16485 root  mem    REG    3,3  549116 1172788 /lib/ld-2.4.so
> gmond   16485 root    0r   CHR    1,3            1023 /dev/null
> gmond   16485 root    1w   CHR    1,3            1023 /dev/null
> gmond   16485 root    2w   CHR    1,3            1023 /dev/null
> gmond   16485 root    3u  IPv4 241510             UDP 239.2.11.71:8649 
> gmond   16485 root    4u  IPv4 241511             TCP *:8649 (LISTEN)
> gmond   16485 root    5u  IPv4 241512             UDP
> legolas.clusterfsck:1027->239.2.11.71:8649 

another thing of interest is that you are not using DNS for host name
resolution (pressume neither in the working box).  see if adding "dns"
to /etc/nsswitch.conf helps.

I can't reproduce the problem here (using similar names that you do and a
similar configuration but with glibc 2.6 in a gentoo 2007.0 x86), getting the
output of the XML generated by gmond until it crashes (with a telnet to 8649)
or a core dump of gmond could probably help further.

Carlo

-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general

Reply via email to