On Sat, Oct 27, 2007 at 09:22:31PM -0400, Andrew Rowland wrote: > Non-Working Node: x86 CLFS-1.0.0 with Linux 2.6.19.1 kernel. Ganglia > built as: > > ./configure --prefix=/opt/ganglia --disable-gexec && make && make > install
so you are using gcc 4.1.1 and glibc 2.4 and nothing of interest to report when compiled the included expat? > > # gstat -a > > 40 files(1.1M bytes) - /usr/src/sys-cluster/ganglia/ganglia-3.0.5 > [EMAIL PROTECTED] for 6D17h30m $ gstat -a > CLUSTER INFORMATION > Name: clusterfsck > Hosts: 1 so the gmond for your x86 is dead even if it started. > weibullone.weibullnet.net > 2 ( 0/ 201) [ 2.59, 2.41, 2.10] [ 0.0, 62.1, 1.4, 36.5, > 0.0] OFF why the working box had a real FQDN name defined and the broken one has a fake non standard one?, can you define a good hostname in the weibullnet.net domain for the x86 and see if that helps? > > > When I gstat -i 172.16.1.101 from the head node, I get the following and > > > the gmond daemon is killed on 172.16.1.101. > > > > > > gexec_cluster() XML_ParseBuffer() error at line 51: > > > no element found > > > > this means that gmond crashed because of a broken xml while using expat, can > > you paste the output of > > > > # lsof -p `pidof gmond` > > On the non-working node: > > 55 files() - /home/users/weibullguy/lsof_4.78/lsof_4.78_src > [EMAIL PROTECTED] for 0h16m $ ./lsof -p `pidof gmond` > COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME > gmond 16485 root cwd DIR 3,3 4096 2 / > gmond 16485 root rtd DIR 3,3 4096 2 / > gmond 16485 root txt REG 3,3 634148 182171 /usr/sbin/gmond > gmond 16485 root mem REG 3,3 152573 > 1172777 /lib/libnss_files-2.4.so > gmond 16485 root mem REG 3,3 6608752 1172766 /lib/libc-2.4.so > gmond 16485 root mem REG 3,3 572389 > 1172773 /lib/libpthread-2.4.so > gmond 16485 root mem REG 3,3 404817 1172783 /lib/libnsl-2.4.so > gmond 16485 root mem REG 3,3 224864 > 1172774 /lib/libresolv-2.4.so > gmond 16485 root mem REG 3,3 99796 1172770 /lib/libdl-2.4.so > gmond 16485 root mem REG 3,3 54376 > 1172772 /lib/libcrypt-2.4.so > gmond 16485 root mem REG 3,3 565268 1172769 /lib/libm-2.4.so > gmond 16485 root mem REG 3,3 165063 1172778 /lib/librt-2.4.so > gmond 16485 root mem REG 3,3 549116 1172788 /lib/ld-2.4.so > gmond 16485 root 0r CHR 1,3 1023 /dev/null > gmond 16485 root 1w CHR 1,3 1023 /dev/null > gmond 16485 root 2w CHR 1,3 1023 /dev/null > gmond 16485 root 3u IPv4 241510 UDP 239.2.11.71:8649 > gmond 16485 root 4u IPv4 241511 TCP *:8649 (LISTEN) > gmond 16485 root 5u IPv4 241512 UDP > legolas.clusterfsck:1027->239.2.11.71:8649 another thing of interest is that you are not using DNS for host name resolution (pressume neither in the working box). see if adding "dns" to /etc/nsswitch.conf helps. I can't reproduce the problem here (using similar names that you do and a similar configuration but with glibc 2.6 in a gentoo 2007.0 x86), getting the output of the XML generated by gmond until it crashes (with a telnet to 8649) or a core dump of gmond could probably help further. Carlo ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Ganglia-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-general

