[
https://issues.apache.org/jira/browse/AMBARI-5228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dmitry Lysnichenko resolved AMBARI-5228.
----------------------------------------
Resolution: Fixed
committed to trunk and to branch-1.5.0
> gmond processes for master components are not starting after upgrade
> --------------------------------------------------------------------
>
> Key: AMBARI-5228
> URL: https://issues.apache.org/jira/browse/AMBARI-5228
> Project: Ambari
> Issue Type: Bug
> Components: controller, test
> Affects Versions: 1.5.0
> Reporter: Dmitry Lysnichenko
> Assignee: Dmitry Lysnichenko
> Priority: Critical
> Fix For: 1.5.0
>
>
> This is an upgrade from 1.4.3 to 1.5.0 and HDP-1.3.2
> After upgrade the gmond processes for master components are not staring. The
> error, when tried to do it manually:
> {noformat}
> /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPHBaseMaster/gmond.core.conf -d1
> Unable to create tcp_accept_channel. Exiting.
> {noformat}
> Looks like, not sure why, the presence of gmond.master.conf file is creating
> trouble. At /etc/ganglia/hdp/HDPHBaseMaster/conf.d
> {noformat}
> rw-r--r-- 1 root hadoop2 442 Mar 22 06:18 gmond.master.conf
> -rw-r--r-- 1 root hadoop2 671 Mar 22 13:39 gmond.slave.conf
> {noformat}
> If the files are removed - did it for JT and HBaseMaster then
> {noformat}
> ps aux | grep gmond
> nobody 5868 1.6 0.0 59108 1860 ? Ssl 17:57 0:00
> /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPTaskTracker/gmond.core.conf
> --pid-file=/var/run/ganglia/hdp/HDPTaskTracker/gmond.pid
> nobody 5894 1.5 0.0 59108 1864 ? Ssl 17:57 0:00
> /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPJobTracker/gmond.core.conf
> --pid-file=/var/run/ganglia/hdp/HDPJobTracker/gmond.pid
> nobody 5913 1.6 0.1 125200 2200 ? Ssl 17:57 0:00
> /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPSlaves/gmond.core.conf
> --pid-file=/var/run/ganglia/hdp/HDPSlaves/gmond.pid
> nobody 5932 1.6 0.0 59108 1828 ? Ssl 17:57 0:00
> /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPDataNode/gmond.core.conf
> --pid-file=/var/run/ganglia/hdp/HDPDataNode/gmond.pid
> nobody 5952 1.6 0.0 59108 1840 ? Ssl 17:57 0:00
> /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPHistoryServer/gmond.core.conf
> --pid-file=/var/run/ganglia/hdp/HDPHistoryServer/gmond.pid
> nobody 5971 1.6 0.0 59108 1824 ? Ssl 17:57 0:00
> /usr/sbin/gmond --conf=/etc/ganglia/hdp/HDPHBaseRegionServer/gmond.core.conf
> --pid-file=/var/run/ganglia/hdp/HDPHBaseRegionServer/gmond.pid
> {noformat}
> Notice that gmond for NameNode is not up as the file was not removed.
> h2. Details:
> At 1.4.x versions, we generated configs following this principle:
> - HDPNameNode/gmond.core.conf contains tcp_accept_channel that binds to
> localhost ( resolves to 127.0.0.1)
> - HDPNameNode/conf.d/gmond.master.conf contains udp_recv_channel and
> tcp_accept_channel that bind to machine hostname (resolves to real ip)
> - HDPNameNode/conf.d/gmond.slave.conf contains udp_send_channel that binds to
> machine hostname (resolves to real ip)
> At 1.5.x versions, we changed that:
> - HDPNameNode/gmond.core.conf contains tcp_accept_channel that binds to any
> interface ( 0.0.0.0 )
> - HDPNameNode/conf.d/gmond.master.conf contains udp_recv_channel that binds
> to machine hostname (resolves to real ip). Note that we now have no any
> tcp_accept_channel definition or it would conflict with definition at
> gmond.core.conf
> - HDPNameNode/conf.d/gmond.slave.conf contains udp_send_channel that binds to
> machine hostname (resolves to real ip). No changes here.
> When we upgrade from 1.4.1 to 1.5.x, these definitions get mixed.
> In fact, gmond configure() is called during install and start, and gmetad
> configure() (that generates gmond.master.conf) is called only during install.
--
This message was sent by Atlassian JIRA
(v6.2#6252)