Hi All,
I have recently upgraded my ESX host to 6.7. This involved my migrating the VM
to another ESX host and then migrating the ganglia web VM back to the original
host. I made no changes on my nodes, nor did I make any changes on the ganglia
web VM other than migration. After the move, I am receiving a heartbeat error
on gmond. The curious thing is one of my grids displays successfully, which is
remote, while the local grid does not display properly.
Here are my services on the Ganglia web server:
[root@ca6web ~]# rpm -qa|grep gmond
ganglia-gmond-3.7.2-2.el7.x86_64
[root@ca6web ~]# systemctl status gmond
â gmond.service - Ganglia Monitoring Daemon
Loaded: loaded (/usr/lib/systemd/system/gmond.service; enabled; vendor
preset: disabled)
Active: active (running) since Fri 2018-08-03 14:33:05 PDT; 1 weeks 2 days
ago
Process: 6447 ExecStart=/usr/sbin/gmond (code=exited, status=0/SUCCESS)
Main PID: 6448 (gmond)
CGroup: /system.slice/gmond.service
ââ6448 /usr/sbin/gmond
Aug 03 14:33:05 ca6web.wai.com systemd[1]: Starting Ganglia Monitoring Daemon...
Aug 03 14:33:05 ca6web.wai.com systemd[1]: Started Ganglia Monitoring Daemon.
Hint: Some lines were ellipsized, use -l to show in full.
[root@ca6web ~]# systemctl status gmetad
â gmetad.service - Ganglia Meta Daemon
Loaded: loaded (/usr/lib/systemd/system/gmetad.service; enabled; vendor
preset: disabled)
Active: active (running) since Fri 2018-08-03 14:15:16 PDT; 1 weeks 2 days
ago
Main PID: 5408 (gmetad)
CGroup: /system.slice/gmetad.service
ââ5408 /usr/sbin/gmetad -d 1
Aug 03 14:15:16 ca6web.wai.com systemd[1]: Starting Ganglia Meta Daemon...
Aug 03 14:15:16 ca6web.wai.com gmetad[5408]: Sources are ...
Aug 03 14:15:16 ca6web.wai.com gmetad[5408]: Source: [NM1, step 60] has 1 so...s
Aug 03 14:15:16 ca6web.wai.com gmetad[5408]: xxx.xxx.xxx.xxx
Aug 03 14:15:16 ca6web.wai.com gmetad[5408]: Source: [CA6, step 15] has 1 so...s
Aug 03 14:15:16 ca6web.wai.com gmetad[5408]: 127.0.0.1
Aug 03 14:15:16 ca6web.wai.com gmetad[5408]: Data thread 139791241848576 is ...e
Aug 03 14:15:16 ca6web.wai.com gmetad[5408]: xxx.xxx.xxx.xxx
Aug 03 14:15:16 ca6web.wai.com gmetad[5408]: Data thread 139791233455872 is ...e
Aug 03 14:15:16 ca6web.wai.com gmetad[5408]: 127.0.0.1
Hint: Some lines were ellipsized, use -l to show in full.
Here is the status result from one of my nodes:
[root@ca6node6 ~]# systemctl status gmond -l
â gmond.service - Ganglia Monitoring Daemon
Loaded: loaded (/usr/lib/systemd/system/gmond.service; enabled; vendor
preset: disabled)
Active: active (running) since Fri 2018-08-03 13:30:09 PDT; 1 weeks 2 days
ago
Process: 23914 ExecStart=/usr/sbin/gmond (code=exited, status=0/SUCCESS)
Main PID: 23915 (gmond)
CGroup: /system.slice/gmond.service
ââ23915 /usr/sbin/gmond
Aug 13 13:03:33 ca6node6.wai.com /usr/sbin/gmond[23915]: Error 1 sending the
modular data for heartbeat
Aug 13 13:03:43 ca6node6.wai.com /usr/sbin/gmond[23915]: Error 1 sending the
modular data for cpu_num
Aug 13 13:03:53 ca6node6.wai.com /usr/sbin/gmond[23915]: Error 1 sending the
modular data for heartbeat
Aug 13 13:04:13 ca6node6.wai.com /usr/sbin/gmond[23915]: Error 1 sending the
modular data for heartbeat
Aug 13 13:04:23 ca6node6.wai.com /usr/sbin/gmond[23915]: Error 1 sending the
modular data for proc_run
Aug 13 13:04:33 ca6node6.wai.com /usr/sbin/gmond[23915]: Error 1 sending the
modular data for heartbeat
Aug 13 13:04:43 ca6node6.wai.com /usr/sbin/gmond[23915]: Error 1 sending the
modular data for cpu_num
Aug 13 13:04:53 ca6node6.wai.com /usr/sbin/gmond[23915]: Error 1 sending the
modular data for heartbeat
Aug 13 13:05:13 ca6node6.wai.com /usr/sbin/gmond[23915]: Error 1 sending the
modular data for heartbeat
Aug 13 13:05:33 ca6node6.wai.com /usr/sbin/gmond[23915]: Error 1 sending the
modular data for heartbeat
Here is the gmond.conf file from my nodes. I use unicast for the gmond daemon.
/* This configuration is as close to 2.5.x default behavior as possible
The values closely match ./gmond/metric.h definitions in 2.5.x */
globals {
daemonize = yes
setuid = yes
user = nobody
debug_level = 0
max_udp_msg_len = 1472
mute = no
deaf = yes
allow_extra_data = yes
host_dmax = 3600 /*secs. Expires (removes from web interface) hosts in 1 hour
*/
host_tmax = 20 /*secs */
cleanup_threshold = 300 /*secs */
gexec = no
# By default gmond will use reverse DNS resolution when displaying your
hostname
# Uncommeting following value will override that value.
# override_hostname = "mywebserver.domain.com"
# If you are not using multicast this value should be set to something other
than 0.s
# Otherwise if you restart aggregator gmond you will get empty graphs. 60
seconds is reasonable
send_metadata_interval = 60 /*secs */
}
/*
* The cluster attributes specified will be used as part of the <CLUSTER>
* tag that will wrap all hosts collected by this instance.
*/
cluster {
name = "CA6"
owner = "My company"
latlong = "Lat and Long"
url = "http://ganglia.wai.com"
}
/* The host section describes attributes of the host, like the location */
host {
location = "server room"
}
/* Feel free to specify as many udp_send_channels as you like. Gmond used to
only support having a single channel */
udp_send_channel {
bind_hostname = yes # Highly recommended, soon to be default.
# This option tells gmond to use a source address
# that resolves to the machine's hostname. Without
# this, the metrics may appear to come from any
# interface and the DNS names associated with
# those IPs will be used to create the RRDs.
host = Web server IP
port = 8649
ttl = 1
}
Here is the gmond.conf from the Web server:
/* This configuration is as close to 2.5.x default behavior as possible
The values closely match ./gmond/metric.h definitions in 2.5.x */
globals {
daemonize = yes
setuid = yes
user = nobody
debug_level = 0
max_udp_msg_len = 1472
mute = yes
deaf = no
allow_extra_data = yes
host_dmax = 3600 /*secs. Expires (removes from web interface) hosts in 1 hour
*/
host_tmax = 20 /*secs */
cleanup_threshold = 300 /*secs */
gexec = no
# By default gmond will use reverse DNS resolution when displaying your
hostname
# Uncommeting following value will override that value.
# override_hostname = "mywebserver.domain.com"
# If you are not using multicast this value should be set to something other
than 0.
# Otherwise if you restart aggregator gmond you will get empty graphs. 60
seconds is reasonable
send_metadata_interval = 60 /*secs */
}
/*
* The cluster attributes specified will be used as part of the <CLUSTER>
* tag that will wrap all hosts collected by this instance.
*/
cluster {
name = "CA6"
owner = "My name"
latlong = "Lat and long"
url = "http://ganglia.wai.com"
}
/* The host section describes attributes of the host, like the location */
host {
location = "Server Room"
}
/* Feel free to specify as many udp_send_channels as you like. Gmond used to
only support having a single channel */
udp_send_channel {
bind_hostname = yes # Highly recommended, soon to be default.
# This option tells gmond to use a source address
# that resolves to the machine's hostname. Without
# this, the metrics may appear to come from any
# interface and the DNS names associated with
# those IPs will be used to create the RRDs.
host = xxx.xxx.xxx.xxx
port = 8649
ttl = 1
}
/* You can specify as many udp_recv_channels as you like as well. */
udp_recv_channel {
port = 8649
# Size of the UDP buffer. If you are handling lots of metrics you really
# should bump it up to e.g. 10MB or even higher.
# buffer = 10485760
}
Here is the gmetad.conf file from the Web server:
data_source "CA6" 127.0.0.1:8649
data_source "NM1" 60 xxx.xxx.xxx.xxx:8650
gridname "Thornton Tomasetti"
All others are defaults.
I have uninstalled and reinstalled the rpm to no avail.
I can successfully connect to the web server:
[root@ca6node1 ~]# nc -uv ca6web 8649
Ncat: Version 6.40 ( http://nmap.org/ncat )
Ncat: Connected to xxx.xxx.xxx.xxx:8649.
I am at my wits end. This configuration has been running successfully for at
least a year. The really strange thing is that the data is being collected
from my remote node successfully and being displayed properly.
Any help is appreciated.
Karl-Heinz Konrad
Consultant
Information Technology
Thornton Tomasetti
19200 Stevens Creek Blvd., Suite 100
Cupertino, CA 95014
T +1.650.230.0210 F +1.650.230.0209
D +1.650.230.0262 M +1.831.246.1687
kkon...@thorntontomasetti.com<mailto:karl-heinz.kon...@wai.com>
www.ThorntonTomasetti.com<http://www.thorntontomasetti.com/>
The information in this email and any attachments may contain confidential
information that is intended solely for the attention and use of the named
addressee(s). This message or any part thereof must not be disclosed, copied,
distributed or retained by any person without authorization from the addressee.
If you are not the intended addressee, please notify the sender immediately,
and delete this message.
The information in this email and any attachments may contain confidential
information that is intended solely for the attention and use of the named
addressee(s). This message or any part thereof must not be disclosed, copied,
distributed or retained by any person without authorization from the addressee.
If you are not the intended addressee, please notify the sender immediately,
and delete this message.
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Ganglia-general mailing list
Ganglia-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ganglia-general