Alexander van der Meij created BIGTOP-1573:
----------------------------------------------
Summary: rpm init scripts do not wait for network
Key: BIGTOP-1573
URL: https://issues.apache.org/jira/browse/BIGTOP-1573
Project: Bigtop
Issue Type: Bug
Components: rpm
Affects Versions: 0.8.0
Environment: CentOS 7
Reporter: Alexander van der Meij
I have used Bigtop to generate a set of RPM's for the purpose of deploying
multi-node Hadoop clusters. All the components work well, save for one network
issue.
It seems that the Hadoop daemons, when started at boot through their init
scripts, do not wait for network initialisation to complete before they
themselves are processed. As a result, when I reboot for example one of my
datanodes, the hadoop-hdfs-datanode process is started using
"localhost.localdomain" as its hostname - and it also advertises itself as such
to the ResourceManager, leading to all sort of connectivity problems in a
multi-node environment.
I first noticed this problem when, after a reboot, I saw log files being
created of the form /var/log/hadoop-hdfs-datanode-localhost.localdomain.log.
When I would restart the hdfs-datanode process using the same init scripts, the
correct /var/log/hadoop-hdfs-datanode-{fqdn}.log are created.
I believe the problem is caused by the introduction of systemd in CentOS 7;
init scripts are run in parallel and there are no contraints present in the
Hadoop init scripts that instruct it to wait until network initialisation is
complete.
Now for the good news, adding $network to the Required-Start/Stop list for all
Hadoop daemons solves the issue for me:
/etc/init.d/hadoop-hdfs-datanode:
# Required-Start: $syslog $remote_fs $network
# Required-Stop: $syslog $remote_fs $network
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)