[
https://issues.apache.org/jira/browse/AMBARI-11605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alejandro Fernandez updated AMBARI-11605:
-----------------------------------------
Description:
When restarting HistoryServer for the first time during the Core Masters
rolling upgrade, the restart fails with the following:
{noformat}
2015-05-28 20:03:32,540 - HdfsResource['/hdp/apps/2.3.0.0-2112/mapreduce']
{'security_enabled': False, 'hadoop_bin_dir':
'/usr/hdp/2.3.0.0-2112/hadoop/bin', 'keytab': [EMPTY], 'default_fs':
'hdfs://c1ha', 'hdfs_site': ..., 'kinit_path_local': 'kinit', 'principal_name':
[EMPTY], 'user': 'hdfs', 'owner': 'hdfs', 'hadoop_conf_dir':
'/usr/hdp/current/hadoop-client/conf', 'type': 'directory', 'action':
['create_on_execute'], 'mode': 0555}
2015-05-28 20:03:32,600 - checked_call['curl -L -w '%{http_code}' -X GET
'http://jhurley-ru-2.c.pramod-thangali.internal:50070/webhdfs/v1/hdp/apps/2.3.0.0-2112/mapreduce?op=GETFILESTATUS&user.name=hdfs'']
{'logoutput': None, 'user': 'hdfs', 'quiet': False}
2015-05-28 20:03:37,862 - checked_call returned (0,
'{"RemoteException":{"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"File
does not exist: /hdp/apps/2.3.0.0-2112/mapreduce"}}404')
2015-05-28 20:03:37,866 - checked_call['curl -L -w '%{http_code}' -X PUT
'http://jhurley-ru-2.c.pramod-thangali.internal:50070/webhdfs/v1/hdp/apps/2.3.0.0-2112/mapreduce?op=MKDIRS&user.name=hdfs'']
{'logoutput': None, 'user': 'hdfs', 'quiet': False}
2015-05-28 20:03:37,993 - checked_call returned (0,
'{"RemoteException":{"exception":"RetriableException","javaClassName":"org.apache.hadoop.ipc.RetriableException","message":"org.apache.hadoop.hdfs.server.namenode.SafeModeException:
Cannot create directory /hdp/apps/2.3.0.0-2112/mapreduce. Name node is in safe
mode.\\nThe reported blocks 414 needs additional 77 blocks to reach the
threshold 0.9900 of total blocks 495.\\nThe number of live datanodes 4 has
reached the minimum number 0. Safe mode will be turned off automatically once
the thresholds have been reached."}}403')
{noformat}
Retrying after this error fixes the problem.
Turns out that now that the HDFS command run faster, by the time the
HistorySever is restarted, it's still possible for the standby NameNode to
still be in safemode.
For this reason, we must wait for both NameNodes to come out of safemode before
proceeding to any other services or Service Checks.
was:
When restarting mapreduce HistoryServer for the first time during the Core
Masters rolling upgrade, the restart fails with the following:
{noformat}
2015-05-28 20:03:32,540 - HdfsResource['/hdp/apps/2.3.0.0-2112/mapreduce']
{'security_enabled': False, 'hadoop_bin_dir':
'/usr/hdp/2.3.0.0-2112/hadoop/bin', 'keytab': [EMPTY], 'default_fs':
'hdfs://c1ha', 'hdfs_site': ..., 'kinit_path_local': 'kinit', 'principal_name':
[EMPTY], 'user': 'hdfs', 'owner': 'hdfs', 'hadoop_conf_dir':
'/usr/hdp/current/hadoop-client/conf', 'type': 'directory', 'action':
['create_on_execute'], 'mode': 0555}
2015-05-28 20:03:32,600 - checked_call['curl -L -w '%{http_code}' -X GET
'http://jhurley-ru-2.c.pramod-thangali.internal:50070/webhdfs/v1/hdp/apps/2.3.0.0-2112/mapreduce?op=GETFILESTATUS&user.name=hdfs'']
{'logoutput': None, 'user': 'hdfs', 'quiet': False}
2015-05-28 20:03:37,862 - checked_call returned (0,
'{"RemoteException":{"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"File
does not exist: /hdp/apps/2.3.0.0-2112/mapreduce"}}404')
2015-05-28 20:03:37,866 - checked_call['curl -L -w '%{http_code}' -X PUT
'http://jhurley-ru-2.c.pramod-thangali.internal:50070/webhdfs/v1/hdp/apps/2.3.0.0-2112/mapreduce?op=MKDIRS&user.name=hdfs'']
{'logoutput': None, 'user': 'hdfs', 'quiet': False}
2015-05-28 20:03:37,993 - checked_call returned (0,
'{"RemoteException":{"exception":"RetriableException","javaClassName":"org.apache.hadoop.ipc.RetriableException","message":"org.apache.hadoop.hdfs.server.namenode.SafeModeException:
Cannot create directory /hdp/apps/2.3.0.0-2112/mapreduce. Name node is in safe
mode.\\nThe reported blocks 414 needs additional 77 blocks to reach the
threshold 0.9900 of total blocks 495.\\nThe number of live datanodes 4 has
reached the minimum number 0. Safe mode will be turned off automatically once
the thresholds have been reached."}}403')
{noformat}
Retrying after this error fixes the problem.
Turns out that now that the HDFS command run faster, by the time the
HistorySever is restarted, it's still possible for the standby NameNode to
still be in safemode.
For this reason, we must wait for both NameNodes to come out of safemode before
proceeding to any other services or Service Checks.
> Restarting HistoryServer fails during RU because NameNode is in safemode
> ------------------------------------------------------------------------
>
> Key: AMBARI-11605
> URL: https://issues.apache.org/jira/browse/AMBARI-11605
> Project: Ambari
> Issue Type: Bug
> Components: ambari-server
> Affects Versions: 2.1.0
> Reporter: Alejandro Fernandez
> Assignee: Alejandro Fernandez
> Fix For: 2.1.0
>
> Attachments: AMBARI-11605.patch
>
>
> When restarting HistoryServer for the first time during the Core Masters
> rolling upgrade, the restart fails with the following:
> {noformat}
> 2015-05-28 20:03:32,540 - HdfsResource['/hdp/apps/2.3.0.0-2112/mapreduce']
> {'security_enabled': False, 'hadoop_bin_dir':
> '/usr/hdp/2.3.0.0-2112/hadoop/bin', 'keytab': [EMPTY], 'default_fs':
> 'hdfs://c1ha', 'hdfs_site': ..., 'kinit_path_local': 'kinit',
> 'principal_name': [EMPTY], 'user': 'hdfs', 'owner': 'hdfs',
> 'hadoop_conf_dir': '/usr/hdp/current/hadoop-client/conf', 'type':
> 'directory', 'action': ['create_on_execute'], 'mode': 0555}
> 2015-05-28 20:03:32,600 - checked_call['curl -L -w '%{http_code}' -X GET
> 'http://jhurley-ru-2.c.pramod-thangali.internal:50070/webhdfs/v1/hdp/apps/2.3.0.0-2112/mapreduce?op=GETFILESTATUS&user.name=hdfs'']
> {'logoutput': None, 'user': 'hdfs', 'quiet': False}
> 2015-05-28 20:03:37,862 - checked_call returned (0,
> '{"RemoteException":{"exception":"FileNotFoundException","javaClassName":"java.io.FileNotFoundException","message":"File
> does not exist: /hdp/apps/2.3.0.0-2112/mapreduce"}}404')
> 2015-05-28 20:03:37,866 - checked_call['curl -L -w '%{http_code}' -X PUT
> 'http://jhurley-ru-2.c.pramod-thangali.internal:50070/webhdfs/v1/hdp/apps/2.3.0.0-2112/mapreduce?op=MKDIRS&user.name=hdfs'']
> {'logoutput': None, 'user': 'hdfs', 'quiet': False}
> 2015-05-28 20:03:37,993 - checked_call returned (0,
> '{"RemoteException":{"exception":"RetriableException","javaClassName":"org.apache.hadoop.ipc.RetriableException","message":"org.apache.hadoop.hdfs.server.namenode.SafeModeException:
> Cannot create directory /hdp/apps/2.3.0.0-2112/mapreduce. Name node is in
> safe mode.\\nThe reported blocks 414 needs additional 77 blocks to reach the
> threshold 0.9900 of total blocks 495.\\nThe number of live datanodes 4 has
> reached the minimum number 0. Safe mode will be turned off automatically once
> the thresholds have been reached."}}403')
> {noformat}
> Retrying after this error fixes the problem.
> Turns out that now that the HDFS command run faster, by the time the
> HistorySever is restarted, it's still possible for the standby NameNode to
> still be in safemode.
> For this reason, we must wait for both NameNodes to come out of safemode
> before proceeding to any other services or Service Checks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)