Dear All,
This should "just work" but doesn't, possibly OS or software versions.
PS have turned off firewall & tcpwrappers; no difference.
Someone wrote
remove you old databases
rm /var/log/ganglia/rrds/*
So if one wants to completely clean up, that is safe to do, confirmed?
I can't get one gmond node collecting data from another & sending to the
gmetad node. All nodes are 32-bit.
gmetad node: vlad, SL5.3, has installed from epel:
Mar 24 15:35:58 Installed: ganglia-3.0.7-1.el5.i386
Mar 24 15:35:58 Installed: ganglia-gmetad-3.0.7-1.el5.i386
Mar 24 15:36:04 Installed: ganglia-gmond-3.0.7-1.el5.i386
Mar 24 15:36:05 Installed: ganglia-web-3.0.7-1.el5.i386
gmetad.conf contains:
data_source "Vlad" localhost
data_source "laptops" beaker.phy.bris.ac.uk:8648 # also orac & scorpio
data_source "servers" bfb.phy.bris.ac.uk:8650 # also b3
No other changes. On vlad, gmond.conf:
cluster {
name = "Vlad"
etc as default
}
udp_send_channel {
mcast_join = vlad.phy.bris.ac.uk
port = 8649
ttl = 1
}
No other changes. Does that look correct?
On bfb, SL4, has installed from somewhere:
ganglia-gmond-3.0.7-1.el4.i386.rpm & ganglia-3.0.7-1.el4.i386.rpm
(for some reason, ganglia is a dependency of ganglia-gmond for whoever
packaged this)
(Another is just ganglia-gmond-3.0.7-1.i386.rpm from rpmforge ganglia
site; doesn't seem to make any difference to this problem, same result)
gmond.conf on bfb:
cluster {
name = "servers"
etc as default
}
udp_send_channel {
mcast_join = bfb.phy.bris.ac.uk
port = 8650
ttl = 1
}
Does that look correct? Other node b3 has identical so should send to gmond
on port 8650 to bfb - is that understanding correct?
But in vlad:/var/lib/ganglia/rrds is only
drwxr-xr-x 2 ganglia root 4096 Apr 1 12:32 __SummaryInfo__/
drwxr-xr-x 4 ganglia root 4096 Apr 1 12:32 Vlad/
Missing the other 2 clusters.
r...@vlad> gmetad --debug=8
Going to run as user ganglia
Sources are ...
Source: [laptops, step 15] has 1 sources
137.222.58.75
Source: [Vlad, step 15] has 1 sources
127.0.0.1
Source: [servers, step 15] has 1 sources
137.222.74.98 (So that looks healthy)
xml listening on port 8651
interactive xml listening on port 8652
cleanup thread has been started
Data thread -1271993456 is monitoring [laptops] data source
137.222.58.75
Data thread -1282483312 is monitoring [Vlad] data source
127.0.0.1
[Vlad] is a 2.5 or later data stream
hash_create size = 1024
hash->size is 1031
hash_create size = 50
hash->size is 53
Data thread -1292973168 is monitoring [servers] data source
137.222.74.98
data_thread() got no answer from any [laptops] datasource
[servers] is a 2.5 or later data stream
hash_create size = 1024
hash->size is 1031
hash_create size = 50
hash->size is 53
So it seems to see laptopst & servers, but no data in
/var/lib/gangalia/rrds.
telnet bfb 8650 shows xml, ditto b3; so that does work.
Is it safe to stop gmetad & delete /var/lib/gangalia/rrds/ to start
completely afresh? If not, what's the safe way? And if data doesn't
reappear, like the data sources one expects, how to debug?
On the web page (vlad runs httpd etc), no 'laptops' shows up, servers has 0
hosts, even vlad has 0 hosts.
On vlad, if
udp_send_channel {
mcast_join = vlad.phy.bris.ac.uk
reverts to
udp_send_channel {
mcast_join = 239.2.11.71
Then on web page vlad has 1 host.
So somehow changing udp_send_channel in vlad's gmond.conf = sees 0.
I keep reading that it should just work, but it doesn't. So completely
clean up & start afresh?
Another q: b3 gmond send to bfb. bfb gmond keep data for both of them where?
in memory? or where in a file on disk?
Many thanks for enlightenment/guidance.
------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Ganglia-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/ganglia-general