Sandeep Nemuri created AMBARI-19899:
---------------------------------------
Summary: Balancer triggered from Ambari UI exits after 40mins
Key: AMBARI-19899
URL: https://issues.apache.org/jira/browse/AMBARI-19899
Project: Ambari
Issue Type: Bug
Components: ambari-server
Affects Versions: 2.4.0
Environment: Ambari 2.4.1.0
Reporter: Sandeep Nemuri
*PROBLEM* : Balancer triggered from Ambari exits from UI after 40mins with the
below error.
{code}
stderr: /var/lib/ambari-agent/data/errors-27542.txt
Traceback (most recent call last):
File
"/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py",
line 408, in <module>
NameNode().execute()
File
"/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
line 219, in execute
method(env)
File
"/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py",
line 344, in rebalancehdfs
logoutput = False,
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line
154, in __init__
self.env.run()
File
"/usr/lib/python2.6/site-packages/resource_management/core/environment.py",
line 160, in run
self.run_action(resource, action)
File
"/usr/lib/python2.6/site-packages/resource_management/core/environment.py",
line 124, in run_action
provider_action()
File
"/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py",
line 238, in action_run
tries=self.resource.tries, try_sleep=self.resource.try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line
70, in inner
result = function(command, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line
92, in checked_call
tries=tries, try_sleep=try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line
140, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line
291, in _call
raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of 'ambari-sudo.sh su hdfs
-l -s /bin/bash -c 'export
PATH='"'"'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/sbin:/usr/sbin:/bin:/usr/bin:/var/lib/ambari-agent:/usr/hdp/current/hadoop-client/bin'"'"'
KRB5CCNAME=/tmp/hdfs_rebalance_cc_f89f1ef37ada08b15101ad05225cf12d ; hdfs
--config /usr/hdp/current/hadoop-client/conf balancer -threshold 10'' returned
252. 17/01/17 10:42:45 INFO balancer.Balancer: Using a threshold of 10.0
17/01/17 10:42:45 INFO balancer.Balancer: namenodes = [hdfs://sandytest]
17/01/17 10:42:45 INFO balancer.Balancer: parameters =
Balancer.BalancerParameters [BalancingPolicy.Node, threshold = 10.0, max idle
iteration = 5, #excluded nodes = 0, #included nodes = 0, #source nodes = 0,
#blockpools = 0, run during upgrade = false]
17/01/17 10:42:45 INFO balancer.Balancer: included nodes = []
17/01/17 10:42:45 INFO balancer.Balancer: excluded nodes = []
17/01/17 10:42:45 INFO balancer.Balancer: source nodes = []
Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved
17/01/17 10:42:46 INFO balancer.KeyManager: Block token params received from
NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
17/01/17 10:42:46 INFO block.BlockTokenSecretManager: Setting block keys
17/01/17 10:42:46 INFO balancer.KeyManager: Update block keys every 2hrs,
30mins, 0sec
java.io.IOException: Another Balancer is running.. Exiting ...
Jan 17, 2017 10:42:46 AM Balancing took 2.048 seconds
stdout: /var/lib/ambari-agent/data/output-27542.txt
{code}
One thing to note here is the balancer process will be still running in the
background.
As per [~lpuskas]
* the command timeout comes from the custom action definition defined in the
service matainfo descriptor
* server side the timeout is increased with a wired in 10min timeout
(thus in case of the hdfs rebalance custom action the timeout will be
1800s+600s=2400s=40min)
* this means that if the rebalance operation takes longer than 40 mins, the
operation will be canceled/timed out)
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)