[jira] [Updated] (BIGTOP-1573) rpm init scripts do not wait for network

Alexander van der Meij (JIRA) Mon, 15 Dec 2014 08:05:46 -0800

     [ 
https://issues.apache.org/jira/browse/BIGTOP-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alexander van der Meij updated BIGTOP-1573:
-------------------------------------------
    Description: 
I have used Bigtop to generate a set of RPM's for the purpose of deploying 
multi-node Hadoop clusters on CentOS 7. All the components work well, save for 
one network issue. 

It seems that the Hadoop daemons, when started at boot through their init 
scripts, do not wait for network initialisation to complete before they 
themselves are processed. As a result, when I reboot for example one of my 
datanodes, the hadoop-hdfs-datanode process is started using 
"localhost.localdomain" as its hostname - and it also advertises itself as such 
to the ResourceManager, leading to all sort of connectivity problems in a 
multi-node environment.

I first noticed this problem when, after a reboot, I saw log files being 
created of the form /var/log/hadoop-hdfs-datanode-localhost.localdomain.log. 
When I would restart the hdfs-datanode process using the same init scripts, the 
correct /var/log/hadoop-hdfs-datanode-(FQDN).log are created. 

I believe the problem is caused by the introduction of systemd in CentOS 7; 
init scripts are run in parallel and there are no contraints present in the 
Hadoop init scripts that instruct it to wait until network initialisation is 
complete. 

Now for the good news, adding $network to the Required-Start/Stop list for all 
Hadoop daemons solves the issue for me:

/etc/init.d/hadoop-hdfs-datanode:
# Required-Start:    $syslog $remote_fs $network
# Required-Stop:     $syslog $remote_fs $network

  was:
I have used Bigtop to generate a set of RPM's for the purpose of deploying 
multi-node Hadoop clusters. All the components work well, save for one network 
issue. 

It seems that the Hadoop daemons, when started at boot through their init 
scripts, do not wait for network initialisation to complete before they 
themselves are processed. As a result, when I reboot for example one of my 
datanodes, the hadoop-hdfs-datanode process is started using 
"localhost.localdomain" as its hostname - and it also advertises itself as such 
to the ResourceManager, leading to all sort of connectivity problems in a 
multi-node environment.

I first noticed this problem when, after a reboot, I saw log files being 
created of the form /var/log/hadoop-hdfs-datanode-localhost.localdomain.log. 
When I would restart the hdfs-datanode process using the same init scripts, the 
correct /var/log/hadoop-hdfs-datanode-(FQDN).log are created. 

I believe the problem is caused by the introduction of systemd in CentOS 7; 
init scripts are run in parallel and there are no contraints present in the 
Hadoop init scripts that instruct it to wait until network initialisation is 
complete. 

Now for the good news, adding $network to the Required-Start/Stop list for all 
Hadoop daemons solves the issue for me:

/etc/init.d/hadoop-hdfs-datanode:
# Required-Start:    $syslog $remote_fs $network
# Required-Stop:     $syslog $remote_fs $network


> rpm init scripts do not wait for network
> ----------------------------------------
>
>                 Key: BIGTOP-1573
>                 URL: https://issues.apache.org/jira/browse/BIGTOP-1573
>             Project: Bigtop
>          Issue Type: Bug
>          Components: rpm
>    Affects Versions: 0.8.0
>         Environment: CentOS 7
>            Reporter: Alexander van der Meij
>              Labels: build
>
> I have used Bigtop to generate a set of RPM's for the purpose of deploying 
> multi-node Hadoop clusters on CentOS 7. All the components work well, save 
> for one network issue. 
> It seems that the Hadoop daemons, when started at boot through their init 
> scripts, do not wait for network initialisation to complete before they 
> themselves are processed. As a result, when I reboot for example one of my 
> datanodes, the hadoop-hdfs-datanode process is started using 
> "localhost.localdomain" as its hostname - and it also advertises itself as 
> such to the ResourceManager, leading to all sort of connectivity problems in 
> a multi-node environment.
> I first noticed this problem when, after a reboot, I saw log files being 
> created of the form /var/log/hadoop-hdfs-datanode-localhost.localdomain.log. 
> When I would restart the hdfs-datanode process using the same init scripts, 
> the correct /var/log/hadoop-hdfs-datanode-(FQDN).log are created. 
> I believe the problem is caused by the introduction of systemd in CentOS 7; 
> init scripts are run in parallel and there are no contraints present in the 
> Hadoop init scripts that instruct it to wait until network initialisation is 
> complete. 
> Now for the good news, adding $network to the Required-Start/Stop list for 
> all Hadoop daemons solves the issue for me:
> /etc/init.d/hadoop-hdfs-datanode:
> # Required-Start:    $syslog $remote_fs $network
> # Required-Stop:     $syslog $remote_fs $network



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (BIGTOP-1573) rpm init scripts do not wait for network

Reply via email to