Dmitry Lysnichenko created AMBARI-5228:
------------------------------------------
Summary: gmond processes for master components are not starting
after upgrade
Key: AMBARI-5228
URL: https://issues.apache.org/jira/browse/AMBARI-5228
Project: Ambari
Issue Type: Bug
Components: controller, test
Affects Versions: 1.5.0
Reporter: Dmitry Lysnichenko
Assignee: Dmitry Lysnichenko
Priority: Critical
Fix For: 1.5.0
This is an upgrade from 1.4.3 to 1.5.0 and HDP-1.3.2
After upgrade the gmond processes for master components are not staring. The
error, when tried to do it manually:
{noformat}
/usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPHBaseMaster/gmond.core.conf -d1
Unable to create tcp_accept_channel. Exiting.
{noformat}
Looks like, not sure why, the presence of gmond.master.conf file is creating
trouble. At /etc/ganglia/hdp/HDPHBaseMaster/conf.d
{noformat}
rw-r--r-- 1 root hadoop2 442 Mar 22 06:18 gmond.master.conf
-rw-r--r-- 1 root hadoop2 671 Mar 22 13:39 gmond.slave.conf
{noformat}
If the files are removed - did it for JT and HBaseMaster then
{noformat}
ps aux | grep gmond
nobody 5868 1.6 0.0 59108 1860 ? Ssl 17:57 0:00
/usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPTaskTracker/gmond.core.conf
--pid-file=/var/run/ganglia/hdp/HDPTaskTracker/gmond.pid
nobody 5894 1.5 0.0 59108 1864 ? Ssl 17:57 0:00
/usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPJobTracker/gmond.core.conf
--pid-file=/var/run/ganglia/hdp/HDPJobTracker/gmond.pid
nobody 5913 1.6 0.1 125200 2200 ? Ssl 17:57 0:00
/usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPSlaves/gmond.core.conf
--pid-file=/var/run/ganglia/hdp/HDPSlaves/gmond.pid
nobody 5932 1.6 0.0 59108 1828 ? Ssl 17:57 0:00
/usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPDataNode/gmond.core.conf
--pid-file=/var/run/ganglia/hdp/HDPDataNode/gmond.pid
nobody 5952 1.6 0.0 59108 1840 ? Ssl 17:57 0:00
/usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPHistoryServer/gmond.core.conf
--pid-file=/var/run/ganglia/hdp/HDPHistoryServer/gmond.pid
nobody 5971 1.6 0.0 59108 1824 ? Ssl 17:57 0:00
/usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPHBaseRegionServer/gmond.core.conf
--pid-file=/var/run/ganglia/hdp/HDPHBaseRegionServer/gmond.pid
{noformat}
Notice that gmond for NameNode is not up as the file was not removed.
h2. Details:
At 1.4.x versions, we generated configs following this principle:
- HDPNameNode/gmond.core.conf contains tcp_accept_channel that binds to
localhost ( resolves to 127.0.0.1)
- HDPNameNode/conf.d/gmond.master.conf contains udp_recv_channel and
tcp_accept_channel that bind to machine hostname (resolves to real ip)
- HDPNameNode/conf.d/gmond.slave.conf contains udp_send_channel that binds to
machine hostname (resolves to real ip)
At 1.5.x versions, we changed that:
- HDPNameNode/gmond.core.conf contains tcp_accept_channel that binds to any
interface ( 0.0.0.0 )
- HDPNameNode/conf.d/gmond.master.conf contains udp_recv_channel that binds
to machine hostname (resolves to real ip). Note that we now have no any
tcp_accept_channel definition or it would conflict with definition at
gmond.core.conf
- HDPNameNode/conf.d/gmond.slave.conf contains udp_send_channel that binds to
machine hostname (resolves to real ip). No changes here.
When we upgrade from 1.4.1 to 1.5.x, these definitions get mixed.
In fact, gmond configure() is called during install and start, and gmetad
configure() (that generates gmond.master.conf) is called only during install.
--
This message was sent by Atlassian JIRA
(v6.2#6252)