Alexander van der Meij created BIGTOP-1573:
----------------------------------------------

             Summary: rpm init scripts do not wait for network
                 Key: BIGTOP-1573
                 URL: https://issues.apache.org/jira/browse/BIGTOP-1573
             Project: Bigtop
          Issue Type: Bug
          Components: rpm
    Affects Versions: 0.8.0
         Environment: CentOS 7
            Reporter: Alexander van der Meij


I have used Bigtop to generate a set of RPM's for the purpose of deploying 
multi-node Hadoop clusters. All the components work well, save for one network 
issue. 

It seems that the Hadoop daemons, when started at boot through their init 
scripts, do not wait for network initialisation to complete before they 
themselves are processed. As a result, when I reboot for example one of my 
datanodes, the hadoop-hdfs-datanode process is started using 
"localhost.localdomain" as its hostname - and it also advertises itself as such 
to the ResourceManager, leading to all sort of connectivity problems in a 
multi-node environment.

I first noticed this problem when, after a reboot, I saw log files being 
created of the form /var/log/hadoop-hdfs-datanode-localhost.localdomain.log. 
When I would restart the hdfs-datanode process using the same init scripts, the 
correct /var/log/hadoop-hdfs-datanode-{fqdn}.log are created. 

I believe the problem is caused by the introduction of systemd in CentOS 7; 
init scripts are run in parallel and there are no contraints present in the 
Hadoop init scripts that instruct it to wait until network initialisation is 
complete. 

Now for the good news, adding $network to the Required-Start/Stop list for all 
Hadoop daemons solves the issue for me:

/etc/init.d/hadoop-hdfs-datanode:
# Required-Start:    $syslog $remote_fs $network
# Required-Stop:     $syslog $remote_fs $network



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to