[
https://issues.apache.org/jira/browse/AMBARI-21527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Di Li updated AMBARI-21527:
---------------------------
Attachment: AMBARI-21527-HA_and_NonHA.patch
This is a problem for both secured and non secured cluster because HDFS Py
script looks for that property *first* and uses it if it exists. This logic and
the fact that the property (seems unnecessarily) merged in during EU with
"localhost" as the value was what caused the issues for Tim and I.
Both Tim and I hit it. He hit it on a kerberos cluster for NN restart. I hit
on a multi-node non secured cluster for remote DN restart.
> Restart of MR2 History Server failed due to wrong NameNode RPC address
> ----------------------------------------------------------------------
>
> Key: AMBARI-21527
> URL: https://issues.apache.org/jira/browse/AMBARI-21527
> Project: Ambari
> Issue Type: Bug
> Components: ambari-server
> Affects Versions: 2.5.2
> Reporter: Siddharth Wagle
> Assignee: Doroszlai, Attila
> Priority: Critical
> Fix For: 2.5.2
>
> Attachments: AMBARI-21527-HA_and_NonHA.patch, AMBARI-21527.patch
>
>
> Steps:
> * Installed BI 4.2 cluster on Ambari 2.2 with Slider and services it required
> * Upgraded Ambari to 2.5.2.0-146
> * Registered HDP 2.6.1.0 repo, installed packages
> * Restarted services that needed restart
> * Ran service checks
> * Started upgrade
> Result: _Restarting History Server_ step failed with
> {noformat:title=errors-87.txt}
> Traceback (most recent call last):
> File
> "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/historyserver.py",
> line 134, in <module>
> HistoryServer().execute()
> File
> "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
> line 329, in execute
> method(env)
> File
> "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
> line 841, in restart
> self.pre_upgrade_restart(env, upgrade_type=upgrade_type)
> File
> "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/historyserver.py",
> line 85, in pre_upgrade_restart
> copy_to_hdfs("mapreduce", params.user_group, params.hdfs_user,
> skip=params.sysprep_skip_copy_tarballs_hdfs)
> File
> "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/copy_tarball.py",
> line 267, in copy_to_hdfs
> replace_existing_files=replace_existing_files,
> File "/usr/lib/python2.6/site-packages/resource_management/core/base.py",
> line 155, in __init__
> self.env.run()
> File
> "/usr/lib/python2.6/site-packages/resource_management/core/environment.py",
> line 160, in run
> self.run_action(resource, action)
> File
> "/usr/lib/python2.6/site-packages/resource_management/core/environment.py",
> line 124, in run_action
> provider_action()
> File
> "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py",
> line 560, in action_create_on_execute
> self.action_delayed("create")
> File
> "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py",
> line 557, in action_delayed
> self.get_hdfs_resource_executor().action_delayed(action_name, self)
> File
> "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py",
> line 292, in action_delayed
> self._create_resource()
> File
> "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py",
> line 308, in _create_resource
> self._create_file(self.main_resource.resource.target,
> source=self.main_resource.resource.source, mode=self.mode)
> File
> "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py",
> line 423, in _create_file
> self.util.run_command(target, 'CREATE', method='PUT', overwrite=True,
> assertable_result=False, file_to_put=source, **kwargs)
> File
> "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py",
> line 204, in run_command
> raise Fail(err_msg)
> resource_management.core.exceptions.Fail: Execution of 'curl -sS -L -w
> '%{http_code}' -X PUT --data-binary
> @/usr/hdp/2.6.1.0-129/hadoop/mapreduce.tar.gz -H 'Content-Type:
> application/octet-stream'
> 'http://c7301.ambari.apache.org:50070/webhdfs/v1/hdp/apps/2.6.1.0-129/mapreduce/mapreduce.tar.gz?op=CREATE&user.name=hdfs&overwrite=True&permission=444''
> returned status_code=403.
> {
> "RemoteException": {
> "exception": "ConnectException",
> "javaClassName": "java.net.ConnectException",
> "message": "Call From c7301.ambari.apache.org/192.168.73.101 to
> c7301.ambari.apache.org:8020 failed on connection exception:
> java.net.ConnectException: Connection refused; For more details see:
> http://wiki.apache.org/hadoop/ConnectionRefused"
> }
> }
> {noformat}
> {noformat:title=NameNode log, pre-upgrade restart}
> 2017-07-18 07:48:05,435 INFO namenode.NameNode
> (NameNode.java:setClientNamenodeAddress(397)) - fs.defaultFS is
> hdfs://c7301.ambari.apache.org:8020
> 2017-07-18 07:48:05,436 INFO namenode.NameNode
> (NameNode.java:setClientNamenodeAddress(417)) - Clients are to use
> c7301.ambari.apache.org:8020 to access this namenode/service.
> 2017-07-18 07:48:07,343 INFO namenode.NameNode
> (NameNodeRpcServer.java:<init>(342)) - RPC server is binding to
> c7301.ambari.apache.org:8020
> 2017-07-18 07:48:07,434 INFO namenode.NameNode
> (NameNode.java:startCommonServices(695)) - NameNode RPC up at:
> c7301.ambari.apache.org/192.168.73.101:8020
> {noformat}
> {noformat:title=NameNode log, in-upgrade restart}
> 2017-07-18 09:03:42,336 INFO namenode.NameNode
> (NameNode.java:setClientNamenodeAddress(450)) - fs.defaultFS is
> hdfs://c7301.ambari.apache.org:8020
> 2017-07-18 09:03:42,337 INFO namenode.NameNode
> (NameNode.java:setClientNamenodeAddress(470)) - Clients are to use
> c7301.ambari.apache.org:8020 to access this namenode/service.
> 2017-07-18 09:03:44,686 INFO namenode.NameNode
> (NameNodeRpcServer.java:<init>(428)) - RPC server is binding to localhost:8020
> 2017-07-18 09:03:44,995 INFO namenode.NameNode
> (NameNode.java:startCommonServices(876)) - NameNode RPC up at:
> localhost/127.0.0.1:8020
> {noformat}
> Looks like something during the upgrade configures NameNode RPC to listen
> only on localhost.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)