[
https://issues.apache.org/jira/browse/AMBARI-18786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15633284#comment-15633284
]
Hudson commented on AMBARI-18786:
---------------------------------
FAILURE: Integrated in Jenkins build Ambari-branch-2.5 #258 (See
[https://builds.apache.org/job/Ambari-branch-2.5/258/])
AMBARI-18786. HDP Upgrade fails when the cluster size is large (dlysnichenko:
[http://git-wip-us.apache.org/repos/asf?p=ambari.git&a=commit&h=7d2b6bbc0a9fa3e7ee667be30b53b83d60f90a79])
* (edit)
ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py
> HDP Upgrade fails when the cluster size is large
> ------------------------------------------------
>
> Key: AMBARI-18786
> URL: https://issues.apache.org/jira/browse/AMBARI-18786
> Project: Ambari
> Issue Type: Bug
> Components: ambari-server
> Reporter: Dmitry Lysnichenko
> Assignee: Dmitry Lysnichenko
> Fix For: 2.5.0
>
> Attachments: AMBARI-18786.patch
>
>
> Starting from Ambari 2.4, when the cluster is large, HDP upgrade fails during
> namenode restart.
> This is because, restart command waits for namenode to come out of safemode
> and if the cluster size is large, namenode takes more time to leave safemode
> but Ambari marks this action as failure as the namenode didn't leave safemode
> within the configured timeout in Ambari scripts.
> {code}
> Traceback (most recent call last):
> File
> "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/jmx.py",
> line 42, in get_value_from_jmx
> return data_dict["beans"][0][property]
> IndexError: list index out of range
> Traceback (most recent call last):
> File
> "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py",
> line 420, in <module>
> NameNode().execute()
> File
> "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
> line 280, in execute
> method(env)
> File
> "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
> line 720, in restart
> self.start(env, upgrade_type=upgrade_type)
> File
> "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py",
> line 101, in start
> upgrade_suspended=params.upgrade_suspended, env=env)
> File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py",
> line 89, in thunk
> return fn(*args, **kwargs)
> File
> "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py",
> line 184, in namenode
> if is_this_namenode_active() is False:
> File
> "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/decorator.py",
> line 55, in wrapper
> return function(*args, **kwargs)
> File
> "/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py",
> line 554, in is_this_namenode_active
> raise Fail(format("The NameNode {namenode_id} is not listed as Active or
> Standby, waiting..."))
> resource_management.core.exceptions.Fail: The NameNode nn1 is not listed as
> Active or Standby, waiting...
> {code}
> To resolve this, we increased the timeout for ambari
> 1. Increased the timeout in
> /var/lib/ambari-server/resources/common-services/HDFS/2.1.0.2.0/package/scripts/hdfs_namenode.py
> from this;
> @retry(times=5, sleep_time=5, backoff_factor=2, err_class=Fail)
> to this;
> @retry(times=25, sleep_time=25, backoff_factor=2, err_class=Fail)
> 2. Restart Ambari server
> After this upgrade went through fine.
> I think its better to increase the timeout permanently so that we don't have
> to deal with this issue again.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)