[ 
https://issues.apache.org/jira/browse/AMBARI-5228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Lysnichenko resolved AMBARI-5228.
----------------------------------------

    Resolution: Fixed

committed to trunk and to branch-1.5.0

> gmond processes for master components are not starting after upgrade
> --------------------------------------------------------------------
>
>                 Key: AMBARI-5228
>                 URL: https://issues.apache.org/jira/browse/AMBARI-5228
>             Project: Ambari
>          Issue Type: Bug
>          Components: controller, test
>    Affects Versions: 1.5.0
>            Reporter: Dmitry Lysnichenko
>            Assignee: Dmitry Lysnichenko
>            Priority: Critical
>             Fix For: 1.5.0
>
>
> This is an upgrade from 1.4.3 to 1.5.0 and HDP-1.3.2
> After upgrade the gmond processes for master components are not staring. The 
> error, when tried to do it manually:
> {noformat}
> /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPHBaseMaster/gmond.core.conf -d1
> Unable to create tcp_accept_channel. Exiting.
> {noformat}
> Looks like, not sure why, the presence of gmond.master.conf file is creating 
> trouble. At /etc/ganglia/hdp/HDPHBaseMaster/conf.d
> {noformat}
> rw-r--r-- 1 root hadoop2  442 Mar 22 06:18 gmond.master.conf
> -rw-r--r-- 1 root hadoop2  671 Mar 22 13:39 gmond.slave.conf
> {noformat}
> If the files are removed - did it for JT and HBaseMaster then
> {noformat}
> ps aux | grep gmond
> nobody    5868  1.6  0.0  59108  1860 ?        Ssl  17:57   0:00 
> /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPTaskTracker/gmond.core.conf 
> --pid-file=/var/run/ganglia/hdp/HDPTaskTracker/gmond.pid
> nobody    5894  1.5  0.0  59108  1864 ?        Ssl  17:57   0:00 
> /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPJobTracker/gmond.core.conf 
> --pid-file=/var/run/ganglia/hdp/HDPJobTracker/gmond.pid
> nobody    5913  1.6  0.1 125200  2200 ?        Ssl  17:57   0:00 
> /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPSlaves/gmond.core.conf 
> --pid-file=/var/run/ganglia/hdp/HDPSlaves/gmond.pid
> nobody    5932  1.6  0.0  59108  1828 ?        Ssl  17:57   0:00 
> /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPDataNode/gmond.core.conf 
> --pid-file=/var/run/ganglia/hdp/HDPDataNode/gmond.pid
> nobody    5952  1.6  0.0  59108  1840 ?        Ssl  17:57   0:00 
> /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPHistoryServer/gmond.core.conf 
> --pid-file=/var/run/ganglia/hdp/HDPHistoryServer/gmond.pid
> nobody    5971  1.6  0.0  59108  1824 ?        Ssl  17:57   0:00 
> /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPHBaseRegionServer/gmond.core.conf 
> --pid-file=/var/run/ganglia/hdp/HDPHBaseRegionServer/gmond.pid
> {noformat}
> Notice that gmond for NameNode is not up as the file was not removed.
> h2. Details:
> At 1.4.x versions, we generated configs following this principle:
> - HDPNameNode/gmond.core.conf contains tcp_accept_channel that binds to 
> localhost ( resolves to 127.0.0.1)
> - HDPNameNode/conf.d/gmond.master.conf contains udp_recv_channel and 
> tcp_accept_channel that bind to machine hostname (resolves to real ip)
> - HDPNameNode/conf.d/gmond.slave.conf contains udp_send_channel that binds to 
> machine hostname (resolves to real ip)
> At 1.5.x versions, we changed that:
> - HDPNameNode/gmond.core.conf contains tcp_accept_channel that binds to any 
> interface ( 0.0.0.0 )
> - HDPNameNode/conf.d/gmond.master.conf contains udp_recv_channel   that binds 
> to machine hostname (resolves to real ip). Note that we now have no any 
> tcp_accept_channel definition or it would conflict with definition at 
> gmond.core.conf 
> - HDPNameNode/conf.d/gmond.slave.conf contains udp_send_channel that binds to 
> machine hostname (resolves to real ip). No changes here.
> When we upgrade from 1.4.1 to 1.5.x, these definitions get mixed.
> In fact, gmond configure() is called during install and start, and gmetad 
> configure() (that generates  gmond.master.conf) is called only during install.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to