Dmitry Lysnichenko created AMBARI-5228:
------------------------------------------

             Summary: gmond processes for master components are not starting 
after upgrade
                 Key: AMBARI-5228
                 URL: https://issues.apache.org/jira/browse/AMBARI-5228
             Project: Ambari
          Issue Type: Bug
          Components: controller, test
    Affects Versions: 1.5.0
            Reporter: Dmitry Lysnichenko
            Assignee: Dmitry Lysnichenko
            Priority: Critical
             Fix For: 1.5.0


This is an upgrade from 1.4.3 to 1.5.0 and HDP-1.3.2

After upgrade the gmond processes for master components are not staring. The 
error, when tried to do it manually:
{noformat}
/usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPHBaseMaster/gmond.core.conf -d1
Unable to create tcp_accept_channel. Exiting.
{noformat}

Looks like, not sure why, the presence of gmond.master.conf file is creating 
trouble. At /etc/ganglia/hdp/HDPHBaseMaster/conf.d
{noformat}
rw-r--r-- 1 root hadoop2  442 Mar 22 06:18 gmond.master.conf
-rw-r--r-- 1 root hadoop2  671 Mar 22 13:39 gmond.slave.conf
{noformat}

If the files are removed - did it for JT and HBaseMaster then
{noformat}
ps aux | grep gmond
nobody    5868  1.6  0.0  59108  1860 ?        Ssl  17:57   0:00 
/usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPTaskTracker/gmond.core.conf 
--pid-file=/var/run/ganglia/hdp/HDPTaskTracker/gmond.pid
nobody    5894  1.5  0.0  59108  1864 ?        Ssl  17:57   0:00 
/usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPJobTracker/gmond.core.conf 
--pid-file=/var/run/ganglia/hdp/HDPJobTracker/gmond.pid
nobody    5913  1.6  0.1 125200  2200 ?        Ssl  17:57   0:00 
/usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPSlaves/gmond.core.conf 
--pid-file=/var/run/ganglia/hdp/HDPSlaves/gmond.pid
nobody    5932  1.6  0.0  59108  1828 ?        Ssl  17:57   0:00 
/usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPDataNode/gmond.core.conf 
--pid-file=/var/run/ganglia/hdp/HDPDataNode/gmond.pid
nobody    5952  1.6  0.0  59108  1840 ?        Ssl  17:57   0:00 
/usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPHistoryServer/gmond.core.conf 
--pid-file=/var/run/ganglia/hdp/HDPHistoryServer/gmond.pid
nobody    5971  1.6  0.0  59108  1824 ?        Ssl  17:57   0:00 
/usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPHBaseRegionServer/gmond.core.conf 
--pid-file=/var/run/ganglia/hdp/HDPHBaseRegionServer/gmond.pid
{noformat}

Notice that gmond for NameNode is not up as the file was not removed.

h2. Details:

At 1.4.x versions, we generated configs following this principle:
- HDPNameNode/gmond.core.conf contains tcp_accept_channel that binds to 
localhost ( resolves to 127.0.0.1)
- HDPNameNode/conf.d/gmond.master.conf contains udp_recv_channel and 
tcp_accept_channel that bind to machine hostname (resolves to real ip)
- HDPNameNode/conf.d/gmond.slave.conf contains udp_send_channel that binds to 
machine hostname (resolves to real ip)

At 1.5.x versions, we changed that:
- HDPNameNode/gmond.core.conf contains tcp_accept_channel that binds to any 
interface ( 0.0.0.0 )
- HDPNameNode/conf.d/gmond.master.conf contains udp_recv_channel   that binds 
to machine hostname (resolves to real ip). Note that we now have no any 
tcp_accept_channel definition or it would conflict with definition at 
gmond.core.conf 
- HDPNameNode/conf.d/gmond.slave.conf contains udp_send_channel that binds to 
machine hostname (resolves to real ip). No changes here.

When we upgrade from 1.4.1 to 1.5.x, these definitions get mixed.
In fact, gmond configure() is called during install and start, and gmetad 
configure() (that generates  gmond.master.conf) is called only during install.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to