Hello! I am trying to use Slider to distribute an application over a YARN cluster. While attempting to use "slider flex" to decrease the number of containers allocated for the application (using the kafka app-package as reference), I came across the following error:
ERROR 2016-07-07 10:57:36,461 CustomServiceOrchestrator.py:169 - Caught an exception while executing command: <class 'AgentException.AgentException'>: 'Script /private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/container_1467829690678_0022_01_000003/app/definition/package/scripts/kafka.py does not exist' Traceback (most recent call last): File "/private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/filecache/71/slider-agent.tar.gz/slider-agent/agent/CustomServiceOrchestrator.py", line 115, in runCommand script_path = self.resolve_script_path(self.base_dir, script, script_type) File "/private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/filecache/71/slider-agent.tar.gz/slider-agent/agent/CustomServiceOrchestrator.py", line 199, in resolve_script_path raise AgentException(message) AgentException: 'Script /private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/container_1467829690678_0022_01_000003/app/definition/package/scripts/kafka.py does not exist' (Seems like the directory is cleared up before the command) Additionally, I tried adding debug prints in the CustomServiceOrchestrator to see what base directory is used for invoking the script and found that the base directories for STATUS and STOP command differ: STATUS command: INFO 2016-07-07 10:56:31,323 AgentToggleLogger.py:40 - Adding STATUS_COMMAND for service kc of cluster kc to the queue. INFO 2016-07-07 10:56:31,327 CustomServiceOrchestrator.py:114 - Base dir: /private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/filecache/113/slider-kafka-package-1.0.0.zip/package STOP command: INFO 2016-07-07 10:57:36,455 AgentToggleLogger.py:40 - Adding EXECUTION_COMMAND for service kc of cluster kc to the queue. INFO 2016-07-07 10:57:36,456 Controller.py:251 - Attempting to gracefully stop the application ... INFO 2016-07-07 10:57:36,458 ActionQueue.py:134 - Package received: INFO 2016-07-07 10:57:36,458 ActionQueue.py:140 - Executing command with id = 4-1 for role = Hello of cluster kc INFO 2016-07-07 10:57:36,460 ActionQueue.py:170 - Running command: {u'roleCommand': u'STOP', u'clusterName': u'kc', u'componentName': u'Hello', u'hostname': u'192.168.1.195', u'hostLevelParams': {u'java_home': u'/Library/Java/JavaVirtualMachines/jdk1.7.0_79.jdk/Contents/Home/', u'container_id': u'container_1467829690678_0022_01_000003'}, u'commandType': u'EXECUTION_COMMAND', u'roleParams': {u'auto_restart': u'false'}, u'serviceName': u'kc', u'role': u'Hello', u'commandParams': {u'record_config': u'true', u'service_package_folder': u'${AGENT_WORK_ROOT}/work/app/definition/package', u'script': u'scripts/kafka.py', u'schema_version': u'2.0', u'command_timeout': u'600', u'script_type': u'PYTHON'}, u'taskId': 4, u'commandId': u'4-1', u'containers': [], u'configurations': {u'global': {u'security_enabled': u'false', u'app_container_id': u'container_1467829690678_0022_01_000003', u'listen_port': u'52508', u'app_root': u'${AGENT_WORK_ROOT}/app/install', u'app_log_dir': u'${AGENT_LOG_ROOT}', u'kc_version': u'1.0.0', u'app_pid_dir': u'${AGENT_WORK_ROOT}/app/run', u'app_container_tag': u'2', u'pid_file': u'${AGENT_WORK_ROOT}/app/run/kc.pid', u'app_install_dir': u'${AGENT_WORK_ROOT}/app/install', u'app_user': u'sarthakk', u'app_input_conf_dir': u'${AGENT_WORK_ROOT}/propagatedconf'}, u'server': {}}} INFO 2016-07-07 10:57:36,461 CustomServiceOrchestrator.py:114 - Base dir: /private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/container_1467829690678_0022_01_000003/app/definition/package For some reason, the STOP command attempts to pick up the script from the container specific location, where the STATUS command goes through an entirely different path (I am not sure though if this is the cause of the issue). Any more pointers to debug this would be really helpful. (For reference, platform: OS X, Python 2.7.11) Thank you Sarthak