[jira] [Updated] (AMBARI-11605) Restarting HistoryServer fails during RU because NameNode is in safemode

Alejandro Fernandez (JIRA) Mon, 01 Jun 2015 20:20:07 -0700

     [ 
https://issues.apache.org/jira/browse/AMBARI-11605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alejandro Fernandez updated AMBARI-11605:
-----------------------------------------
    Description: 
When restarting HistoryServer for the first time during the Core Masters 
rolling upgrade, the restart fails with the following:

{noformat}
2015-05-28 20:03:32,540 - HdfsResource['/hdp/apps/2.3.0.0-2112/mapreduce'] 
{'security_enabled': False, 'hadoop_bin_dir': 
'/usr/hdp/2.3.0.0-2112/hadoop/bin', 'keytab': [EMPTY], 'default_fs': 
'hdfs://c1ha', 'hdfs_site': ..., 'kinit_path_local': 'kinit', 'principal_name': 
[EMPTY], 'user': 'hdfs', 'owner': 'hdfs', 'hadoop_conf_dir': 
'/usr/hdp/current/hadoop-client/conf', 'type': 'directory', 'action': 
['create_on_execute'], 'mode': 0555}
2015-05-28 20:03:32,600 - checked_call['curl -L -w '%{http_code}' -X GET 
'http://jhurley-ru-2.c.pramod-thangali.internal:50070/webhdfs/v1/hdp/apps/2.3.0.0-2112/mapreduce?op=GETFILESTATUS&user.name=hdfs'']
 {'logoutput': None, 'user': 'hdfs', 'quiet': False}
2015-05-28 20:03:37,862 - checked_call returned (0, 
'{"RemoteException":{"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"File
 does not exist: /hdp/apps/2.3.0.0-2112/mapreduce"}}404')
2015-05-28 20:03:37,866 - checked_call['curl -L -w '%{http_code}' -X PUT 
'http://jhurley-ru-2.c.pramod-thangali.internal:50070/webhdfs/v1/hdp/apps/2.3.0.0-2112/mapreduce?op=MKDIRS&user.name=hdfs'']
 {'logoutput': None, 'user': 'hdfs', 'quiet': False}
2015-05-28 20:03:37,993 - checked_call returned (0, 
'{"RemoteException":{"exception":"RetriableException","javaClassName":"org.apache.hadoop.ipc.RetriableException","message":"org.apache.hadoop.hdfs.server.namenode.SafeModeException:
 Cannot create directory /hdp/apps/2.3.0.0-2112/mapreduce. Name node is in safe 
mode.\\nThe reported blocks 414 needs additional 77 blocks to reach the 
threshold 0.9900 of total blocks 495.\\nThe number of live datanodes 4 has 
reached the minimum number 0. Safe mode will be turned off automatically once 
the thresholds have been reached."}}403')
{noformat}

Retrying after this error fixes the problem.

Turns out that now that the HDFS command run faster, by the time the 
HistorySever is restarted, it's still possible for the standby NameNode to 
still be in safemode.
For this reason, we must wait for both NameNodes to come out of safemode before 
proceeding to any other services or Service Checks.

  was:
When restarting mapreduce HistoryServer for the first time during the Core 
Masters rolling upgrade, the restart fails with the following:

{noformat}
2015-05-28 20:03:32,540 - HdfsResource['/hdp/apps/2.3.0.0-2112/mapreduce'] 
{'security_enabled': False, 'hadoop_bin_dir': 
'/usr/hdp/2.3.0.0-2112/hadoop/bin', 'keytab': [EMPTY], 'default_fs': 
'hdfs://c1ha', 'hdfs_site': ..., 'kinit_path_local': 'kinit', 'principal_name': 
[EMPTY], 'user': 'hdfs', 'owner': 'hdfs', 'hadoop_conf_dir': 
'/usr/hdp/current/hadoop-client/conf', 'type': 'directory', 'action': 
['create_on_execute'], 'mode': 0555}
2015-05-28 20:03:32,600 - checked_call['curl -L -w '%{http_code}' -X GET 
'http://jhurley-ru-2.c.pramod-thangali.internal:50070/webhdfs/v1/hdp/apps/2.3.0.0-2112/mapreduce?op=GETFILESTATUS&user.name=hdfs'']
 {'logoutput': None, 'user': 'hdfs', 'quiet': False}
2015-05-28 20:03:37,862 - checked_call returned (0, 
'{"RemoteException":{"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"File
 does not exist: /hdp/apps/2.3.0.0-2112/mapreduce"}}404')
2015-05-28 20:03:37,866 - checked_call['curl -L -w '%{http_code}' -X PUT 
'http://jhurley-ru-2.c.pramod-thangali.internal:50070/webhdfs/v1/hdp/apps/2.3.0.0-2112/mapreduce?op=MKDIRS&user.name=hdfs'']
 {'logoutput': None, 'user': 'hdfs', 'quiet': False}
2015-05-28 20:03:37,993 - checked_call returned (0, 
'{"RemoteException":{"exception":"RetriableException","javaClassName":"org.apache.hadoop.ipc.RetriableException","message":"org.apache.hadoop.hdfs.server.namenode.SafeModeException:
 Cannot create directory /hdp/apps/2.3.0.0-2112/mapreduce. Name node is in safe 
mode.\\nThe reported blocks 414 needs additional 77 blocks to reach the 
threshold 0.9900 of total blocks 495.\\nThe number of live datanodes 4 has 
reached the minimum number 0. Safe mode will be turned off automatically once 
the thresholds have been reached."}}403')
{noformat}

Retrying after this error fixes the problem.

Turns out that now that the HDFS command run faster, by the time the 
HistorySever is restarted, it's still possible for the standby NameNode to 
still be in safemode.
For this reason, we must wait for both NameNodes to come out of safemode before 
proceeding to any other services or Service Checks.


> Restarting HistoryServer fails during RU because NameNode is in safemode
> ------------------------------------------------------------------------
>
>                 Key: AMBARI-11605
>                 URL: https://issues.apache.org/jira/browse/AMBARI-11605
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 2.1.0
>            Reporter: Alejandro Fernandez
>            Assignee: Alejandro Fernandez
>             Fix For: 2.1.0
>
>         Attachments: AMBARI-11605.patch
>
>
> When restarting HistoryServer for the first time during the Core Masters 
> rolling upgrade, the restart fails with the following:
> {noformat}
> 2015-05-28 20:03:32,540 - HdfsResource['/hdp/apps/2.3.0.0-2112/mapreduce'] 
> {'security_enabled': False, 'hadoop_bin_dir': 
> '/usr/hdp/2.3.0.0-2112/hadoop/bin', 'keytab': [EMPTY], 'default_fs': 
> 'hdfs://c1ha', 'hdfs_site': ..., 'kinit_path_local': 'kinit', 
> 'principal_name': [EMPTY], 'user': 'hdfs', 'owner': 'hdfs', 
> 'hadoop_conf_dir': '/usr/hdp/current/hadoop-client/conf', 'type': 
> 'directory', 'action': ['create_on_execute'], 'mode': 0555}
> 2015-05-28 20:03:32,600 - checked_call['curl -L -w '%{http_code}' -X GET 
> 'http://jhurley-ru-2.c.pramod-thangali.internal:50070/webhdfs/v1/hdp/apps/2.3.0.0-2112/mapreduce?op=GETFILESTATUS&user.name=hdfs'']
>  {'logoutput': None, 'user': 'hdfs', 'quiet': False}
> 2015-05-28 20:03:37,862 - checked_call returned (0, 
> '{"RemoteException":{"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"File
>  does not exist: /hdp/apps/2.3.0.0-2112/mapreduce"}}404')
> 2015-05-28 20:03:37,866 - checked_call['curl -L -w '%{http_code}' -X PUT 
> 'http://jhurley-ru-2.c.pramod-thangali.internal:50070/webhdfs/v1/hdp/apps/2.3.0.0-2112/mapreduce?op=MKDIRS&user.name=hdfs'']
>  {'logoutput': None, 'user': 'hdfs', 'quiet': False}
> 2015-05-28 20:03:37,993 - checked_call returned (0, 
> '{"RemoteException":{"exception":"RetriableException","javaClassName":"org.apache.hadoop.ipc.RetriableException","message":"org.apache.hadoop.hdfs.server.namenode.SafeModeException:
>  Cannot create directory /hdp/apps/2.3.0.0-2112/mapreduce. Name node is in 
> safe mode.\\nThe reported blocks 414 needs additional 77 blocks to reach the 
> threshold 0.9900 of total blocks 495.\\nThe number of live datanodes 4 has 
> reached the minimum number 0. Safe mode will be turned off automatically once 
> the thresholds have been reached."}}403')
> {noformat}
> Retrying after this error fixes the problem.
> Turns out that now that the HDFS command run faster, by the time the 
> HistorySever is restarted, it's still possible for the standby NameNode to 
> still be in safemode.
> For this reason, we must wait for both NameNodes to come out of safemode 
> before proceeding to any other services or Service Checks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (AMBARI-11605) Restarting HistoryServer fails during RU because NameNode is in safemode

Reply via email to