-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27754/
-----------------------------------------------------------

(Updated Nov. 11, 2014, 8:49 p.m.)


Review request for Ambari, Mahadev Konar, Sumit Mohanty, Sid Wagle, and Yusaku 
Sako.


Bugs: AMBARI-8220
    https://issues.apache.org/jira/browse/AMBARI-8220


Repository: ambari


Description
-------

Very often install fails due to timeout installing hadoop_2_2* packages, which 
can take up to 8-12 mins.

Each service has a metainfo.xml file that defines the timeout for each 
Component for all types of actions (e.g., INSTALL, START, CONFIGURE, STOP).

Ambari doesn't currently have a mechanism to set a different timeout just for 
the INSTALL operation, so instead, the server side java code can do the 
following:

Get the default agent timeout from the ambari.properties file (which will be 
increased from 10 mins to 15 mins)

Get the service component's timeout if it exists. If the operation is an 
INSTALL and service component timeout is less than the default timeout, then 
use the default timeout.


Diffs
-----

  ambari-agent/src/main/python/ambari_agent/PythonExecutor.py 874b70b 
  ambari-server/conf/unix/ambari.properties 8563cf2 
  
ambari-server/src/main/java/org/apache/ambari/server/configuration/Configuration.java
 a0d5b39 
  
ambari-server/src/main/java/org/apache/ambari/server/controller/AmbariManagementControllerImpl.java
 4f69dbb 

Diff: https://reviews.apache.org/r/27754/diff/


Testing
-------

----------------------------------------------------------------------
Total run:693
Total errors:0
Total failures:0
OK


Copied all of the changed files,

yes | cp 
/vagrant/ambari/ambari-agent/src/main/python/ambari_agent/PythonExecutor.py  
/usr/lib/python2.6/site-packages/ambari_server/PythonExecutor.py
yes | cp 
/vagrant/ambari/ambari-agent/src/main/python/ambari_agent/PythonExecutor.py  
/usr/lib/python2.6/site-packages/ambari_agent/PythonExecutor.py
yes | cp /vagrant/ambari/ambari-server/target/ambari-server-*.jar               
      /usr/lib/ambari-server/ambari-server-*.jar

Edited /etc/ambari-server/conf/ambari.properties and changed the 
agent.task.timeout value from 600 to 900.

Then modified the ResourceManager and NodeManager timeouts in 
/var/lib/ambari-agent/cache/stacks/HDP/2.0.6/services/HDFS/metainfo.xml as 
follows,
ResourceManager         <timeout>642</timeout>
NodeManager             <timeout>1042</timeout>

Then ran ambari-server restart

Then created a cluster and added all of the services, and reran service checks.

Upon adding the YARN services and inspecting the command-*.json files, they had,


    "commandParams": {
        "command_timeout": "1042",
        "script": "scripts/nodemanager.py",
        "script_type": "PYTHON",
        "service_package_folder": "HDP/2.0.6/services/YARN/package",
        "hooks_folder": "HDP/2.0.6/hooks"
    },
    
        "commandParams": {
        "command_timeout": "900",
        "script": "scripts/resourcemanager.py",
        "script_type": "PYTHON",
        "service_package_folder": "HDP/2.0.6/services/YARN/package",
        "hooks_folder": "HDP/2.0.6/hooks"
    },


Notice that the resource manager initially had a value less than the agent 
default, so it was increased to it.


Thanks,

Alejandro Fernandez

Reply via email to