Thanks for the quick reply... On Sat, 2007-10-27 at 19:18 -0500, Carlo Marcelo Arenas Belon wrote: > On Sat, Oct 27, 2007 at 04:42:00PM -0400, Andrew Rowland wrote: > > I have just installed Ganglia-3.0.5. Configured without gexec on both > > machines and with gmetad on one, but not the other. I am able to start > > gmond and gmetad with no errors. But I am having problems on one of my > > machines with gmond. > > which OS/arch/release?, if Linux, which distribution and if not using the > distribution provided packages, what options were used to build it?
"Head" Node (Yes, the one with gmetad): x86_64 with multilib CLFS-1.0.0
with Linux 2.6.19.1 kernel. Ganglia built as:
CC="gcc ${BUILD64}" PKG_CONFIG_PATH="${PKG_CONFIG_PATH64}" ./configure
--prefix=/opt/ganglia --libdir=/opt/ganglia/lib64 --with-gmetad
--disable-gexec && make && make install
Non-Working Node: x86 CLFS-1.0.0 with Linux 2.6.19.1 kernel. Ganglia
built as:
./configure --prefix=/opt/ganglia --disable-gexec && make && make
install
> > Issuing gstat on the head node gives the following:
>
> you mean the head node is the one that has gmetad?, both are technically head
> nodes based on your configuration as you are using multicast and enabling TCP
> (so that gmetad can poll them).
>
> > CLUSTER INFORMATION
> > Name: clusterfsck
> > Hosts: 1
> > Gexec Hosts: 0
> > Dead Hosts: 0
> > Localtime: Sat Oct 27 16:34:15 2007
> >
> > There are no hosts running gexec at this time
>
> what do you get if running, I suspect you will only see 1 host, which is
> the one you are polling.
>
> # gstat -a
40 files(1.1M bytes) - /usr/src/sys-cluster/ganglia/ganglia-3.0.5
[EMAIL PROTECTED] for 6D17h30m $ gstat -a
CLUSTER INFORMATION
Name: clusterfsck
Hosts: 1
Gexec Hosts: 0
Dead Hosts: 0
Localtime: Sat Oct 27 20:49:50 2007
CLUSTER HOSTS
Hostname LOAD CPU
Gexec
CPUs (Procs/Total) [ 1, 5, 15min] [ User, Nice, System, Idle,
Wio]
weibullone.weibullnet.net
2 ( 0/ 201) [ 2.59, 2.41, 2.10] [ 0.0, 62.1, 1.4, 36.5,
0.0] OFF
> > When I gstat -i 172.16.1.101 from the head node, I get the following and
> > the gmond daemon is killed on 172.16.1.101.
> >
> > gexec_cluster() XML_ParseBuffer() error at line 51:
> > no element found
>
> this means that gmond crashed because of a broken xml while using expat, can
> you paste the output of
>
> # lsof -p `pidof gmond`
On the non-working node:
55 files() - /home/users/weibullguy/lsof_4.78/lsof_4.78_src
[EMAIL PROTECTED] for 0h16m $ ./lsof -p `pidof gmond`
COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME
gmond 16485 root cwd DIR 3,3 4096 2 /
gmond 16485 root rtd DIR 3,3 4096 2 /
gmond 16485 root txt REG 3,3 634148 182171 /usr/sbin/gmond
gmond 16485 root mem REG 3,3 152573
1172777 /lib/libnss_files-2.4.so
gmond 16485 root mem REG 3,3 6608752 1172766 /lib/libc-2.4.so
gmond 16485 root mem REG 3,3 572389
1172773 /lib/libpthread-2.4.so
gmond 16485 root mem REG 3,3 404817 1172783 /lib/libnsl-2.4.so
gmond 16485 root mem REG 3,3 224864
1172774 /lib/libresolv-2.4.so
gmond 16485 root mem REG 3,3 99796 1172770 /lib/libdl-2.4.so
gmond 16485 root mem REG 3,3 54376
1172772 /lib/libcrypt-2.4.so
gmond 16485 root mem REG 3,3 565268 1172769 /lib/libm-2.4.so
gmond 16485 root mem REG 3,3 165063 1172778 /lib/librt-2.4.so
gmond 16485 root mem REG 3,3 549116 1172788 /lib/ld-2.4.so
gmond 16485 root 0r CHR 1,3 1023 /dev/null
gmond 16485 root 1w CHR 1,3 1023 /dev/null
gmond 16485 root 2w CHR 1,3 1023 /dev/null
gmond 16485 root 3u IPv4 241510 UDP 239.2.11.71:8649
gmond 16485 root 4u IPv4 241511 TCP *:8649 (LISTEN)
gmond 16485 root 5u IPv4 241512 UDP
legolas.clusterfsck:1027->239.2.11.71:8649
> Carlo
--
Andrew "Weibullguy" Rowland
Reliability & Safety Engineer
[EMAIL PROTECTED]
http://webpages.charter.net/weibullguy
http://reliafree.sourceforge.net
signature.asc
Description: This is a digitally signed message part
------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________ Ganglia-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/ganglia-general

