Quoting Martin Knoblauch <[EMAIL PROTECTED]>:
Hi Ashutosh,
sorry for the wrong port. I meant of course 8651.
You could try to run "gmetad" with a high debug level. This could help
to track down the problem.
[EMAIL PROTECTED] ~]# /usr/sbin/gmetad -d10
Going to run as user nobody
Sources are ...
Source: [Blaze, step 15] has 1 sources
192.168.1.1
xml listening on port 8651
interactive xml listening on port 8652
cleanup thread has been started
it shows absolutely nothing after that. from another terminal, if i do
telnet localhost 8651, it also stays hung.
heres gmetad.conf after pruning comments:
data_source "Blaze" 15 192.168.1.1
xml_port 8651
interactive_port 8652
rrd_rootdir "/scratch/rrdtool-1.2.15/lib/rrds"
heres a pruned output of telnet localhost 8649:
<?xml version= ....
...
<GANGLIA_XML VERSION="3.0.4" SOURCE="gmond">
<CLUSTER NAME="Lehigh University Beowulf Cluster"
LOCALTIME="1169059422" OWNER="Lehigh University, LTS" LATLONG="N45W45"
URL="http://www.lehigh.edu/computing/hpc">
</CLUSTER>
</GANGLIA_XML>
This is surprising since it does not show any expected metrics besides
the name and owner.
gmond.conf:
lobals {
daemonize = yes
setuid = yes
user = nobody
debug_level = 0
max_udp_msg_len = 1472
mute = no
deaf = no
host_dmax = 0 /*secs */
cleanup_threshold = 300 /*secs */
gexec = no
}
cluster {
name = "Lehigh University Beowulf Cluster"
owner = "Lehigh University, LTS"
latlong = "N45W45"
url = "http://www.lehigh.edu/computing/hpc"
}
host {
location = "unspecified"
}
udp_send_channel {
mcast_join = 239.2.11.71
mcast_if = eth0
port = 8649
ttl = 1
}
udp_recv_channel {
mcast_join = 239.2.11.71
mcast_if = eth0
port = 8649
bind = 239.2.11.71
}
tcp_accept_channel {
port = 8649
}
collection_group {
collect_once = yes
time_threshold = 20
metric {
name = "heartbeat"
}
}
collection_group {
collect_once = yes
time_threshold = 1200
metric {
name = "cpu_num"
}
metric {
name = "cpu_speed"
}
metric {
name = "mem_total"
}
metric {
name = "swap_total"
}
metric {
name = "boottime"
}
metric {
name = "machine_type"
}
metric {
name = "os_name"
}
metric {
name = "os_release"
}
metric {
name = "location"
}
}
collection_group {
collect_once = yes
time_threshold = 300
metric {
name = "gexec"
}
}
collection_group {
collect_every = 20
time_threshold = 90
metric {
name = "cpu_user"
value_threshold = "1.0"
}
metric {
name = "cpu_system"
value_threshold = "1.0"
}
metric {
name = "cpu_idle"
value_threshold = "5.0"
}
metric {
name = "cpu_nice"
value_threshold = "1.0"
}
metric {
name = "cpu_aidle"
value_threshold = "5.0"
}
metric {
name = "cpu_wio"
value_threshold = "1.0"
}
metric {
name = "cpu_intr"
value_threshold = "1.0"
}
metric {
name = "cpu_sintr"
value_threshold = "1.0"
}
}
....
[more collection_groups]
ifconfig:
eth0 Link encap:Ethernet HWaddr 00:50:45:5C:21:4A
inet addr:192.168.1.1 Bcast:192.168.3.255 Mask:255.255.252.0
inet6 addr: fe80::250:45ff:fe5c:214a/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:123003000 errors:0 dropped:0 overruns:0 frame:0
TX packets:122365917 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:42765111269 (39.8 GiB) TX bytes:65546395304 (61.0 GiB)
Interrupt:201
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:8559705 errors:0 dropped:0 overruns:0 frame:0
TX packets:8559705 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:6976648110 (6.4 GiB) TX bytes:6976648110 (6.4 GiB)
Thanks
Ashutosh
----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.