[jira] [Updated] (HADOOP-18038) "hdfs --daemon start" command may write invalid PID to file

Mariusz Okulanis (Jira) Wed, 08 Dec 2021 08:42:06 -0800


     [ 
https://issues.apache.org/jira/browse/HADOOP-18038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Mariusz Okulanis updated HADOOP-18038:
--------------------------------------
    Description: 
Starting a daemon with {{*hdfs --daemon start ...*}} (and also {{{}*yarn 
--daemon start ...*{}}}) might result in writing invalid PID to PIDfile.

Scenario: run {{*hdfs --daemon start namenode*}} (or any other hadoop daemon).

Expected result: PID of running namenode java process gets written to PID file.

Actual result (non-deterministic): PID of exited bash process get written to 
PID file.

 

Root cause of the issue is a fact that both daemon launching bash functions - 
{{*hadoop_start_daemon*}} and {{*hadoop_start_daemon_wrapper*}} - are 
concurrently writing different PIDs to the same file, and only PID written by 
{{*hadoop_start_daemon_wrapper*}} is correct. Order of those writes is weakly 
synchronised (with hardcoded 5s timeout). Under specific circumstances (like 
heavy CPU load) this ordering might not be preserved resulting in invalid PID 
ending up in PIDfile.

 

Possible solution: It seems that it's unnecessary for {{*hadoop_start_daemon*}} 
to write to pidfile if it's being called from {{*hadoop_start_daemon_wrapper*}} 
- it should skip this step in this scenario.

  was:
Starting a daemon with hdfs --daemon start ... (and also yarn --daemon start 
...) might result in writing invalid PID to PIDfile.

Root cause of the issue is a fact that both daemon launching bash functions - 
hadoop_start_daemon and hadoop_start_daemon_wrapper - are concurrently writing 
different PIDs to the same file, and only PID written by 
hadoop_start_daemon_wrapper is correct. Order of those writes is weakly 
synchronised (with hardcoded 5s timeout). Under specific circumstances (like 
heavy CPU load) this ordering might not be preserved resulting in invalid PID 
ending up in PIDfile.

 

Possible solution: It seems that it's unnecessary for hadoop_start_daemon to 
write to pidfile if it's being called from hadoop_start_daemon_wrapper - it 
should skip this step in this scenario.


> "hdfs --daemon start" command may write invalid PID to file
> -----------------------------------------------------------
>
>                 Key: HADOOP-18038
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18038
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: scripts
>    Affects Versions: 3.2.2
>            Reporter: Mariusz Okulanis
>            Priority: Minor
>
> Starting a daemon with {{*hdfs --daemon start ...*}} (and also {{{}*yarn 
> --daemon start ...*{}}}) might result in writing invalid PID to PIDfile.
> Scenario: run {{*hdfs --daemon start namenode*}} (or any other hadoop daemon).
> Expected result: PID of running namenode java process gets written to PID 
> file.
> Actual result (non-deterministic): PID of exited bash process get written to 
> PID file.
>  
> Root cause of the issue is a fact that both daemon launching bash functions - 
> {{*hadoop_start_daemon*}} and {{*hadoop_start_daemon_wrapper*}} - are 
> concurrently writing different PIDs to the same file, and only PID written by 
> {{*hadoop_start_daemon_wrapper*}} is correct. Order of those writes is weakly 
> synchronised (with hardcoded 5s timeout). Under specific circumstances (like 
> heavy CPU load) this ordering might not be preserved resulting in invalid PID 
> ending up in PIDfile.
>  
> Possible solution: It seems that it's unnecessary for 
> {{*hadoop_start_daemon*}} to write to pidfile if it's being called from 
> {{*hadoop_start_daemon_wrapper*}} - it should skip this step in this scenario.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HADOOP-18038) "hdfs --daemon start" command may write invalid PID to file

Reply via email to