I think that /private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/container_1467829690678_0022_01_000003/app/definition is linked to /private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/filecache/113/slider-kafka-package-1.0.0.zip, so those are actually the same directory. I am not sure why it is saying kafka.py does not exist when stopping the container; it definitely should not clean up that directory while a container is still running. Can you verify that app/definition/package/scripts/kafka.py exists for one of the containers that is running?
On Thu, Jul 7, 2016 at 11:50 AM, Sarthak Kukreti <skuk...@ncsu.edu> wrote: > Hello! > > I am trying to use Slider to distribute an application over a YARN > cluster. While attempting to use "slider flex" to decrease the number > of containers allocated for the application (using the kafka > app-package as reference), I came across the following error: > > ERROR 2016-07-07 10:57:36,461 CustomServiceOrchestrator.py:169 - > Caught an exception while executing command: <class > 'AgentException.AgentException'>: 'Script > > /private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/container_1467829690678_0022_01_000003/app/definition/package/scripts/kafka.py > does not exist' > Traceback (most recent call last): > File > "/private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/filecache/71/slider-agent.tar.gz/slider-agent/agent/CustomServiceOrchestrator.py", > line 115, in runCommand > script_path = self.resolve_script_path(self.base_dir, script, > script_type) > File > "/private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/filecache/71/slider-agent.tar.gz/slider-agent/agent/CustomServiceOrchestrator.py", > line 199, in resolve_script_path > raise AgentException(message) > AgentException: 'Script > > /private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/container_1467829690678_0022_01_000003/app/definition/package/scripts/kafka.py > does not exist' > > > (Seems like the directory is cleared up before the command) > > Additionally, I tried adding debug prints in the > CustomServiceOrchestrator to see what base directory is used for > invoking the script and found that the base directories for STATUS and > STOP command differ: > > STATUS command: > > INFO 2016-07-07 10:56:31,323 AgentToggleLogger.py:40 - Adding > STATUS_COMMAND for service kc of cluster kc to the queue. > INFO 2016-07-07 10:56:31,327 CustomServiceOrchestrator.py:114 - Base > dir: > /private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/filecache/113/slider-kafka-package-1.0.0.zip/package > > > STOP command: > > INFO 2016-07-07 10:57:36,455 AgentToggleLogger.py:40 - Adding > EXECUTION_COMMAND for service kc of cluster kc to the queue. > INFO 2016-07-07 10:57:36,456 Controller.py:251 - Attempting to > gracefully stop the application ... > INFO 2016-07-07 10:57:36,458 ActionQueue.py:134 - Package received: > INFO 2016-07-07 10:57:36,458 ActionQueue.py:140 - Executing command > with id = 4-1 for role = Hello of cluster kc > INFO 2016-07-07 10:57:36,460 ActionQueue.py:170 - Running command: > {u'roleCommand': u'STOP', u'clusterName': u'kc', u'componentName': > u'Hello', u'hostname': u'192.168.1.195', u'hostLevelParams': > {u'java_home': > u'/Library/Java/JavaVirtualMachines/jdk1.7.0_79.jdk/Contents/Home/', > u'container_id': u'container_1467829690678_0022_01_000003'}, > u'commandType': u'EXECUTION_COMMAND', u'roleParams': {u'auto_restart': > u'false'}, u'serviceName': u'kc', u'role': u'Hello', u'commandParams': > {u'record_config': u'true', u'service_package_folder': > u'${AGENT_WORK_ROOT}/work/app/definition/package', u'script': > u'scripts/kafka.py', u'schema_version': u'2.0', u'command_timeout': > u'600', u'script_type': u'PYTHON'}, u'taskId': 4, u'commandId': > u'4-1', u'containers': [], u'configurations': {u'global': > {u'security_enabled': u'false', u'app_container_id': > u'container_1467829690678_0022_01_000003', u'listen_port': u'52508', > u'app_root': u'${AGENT_WORK_ROOT}/app/install', u'app_log_dir': > u'${AGENT_LOG_ROOT}', u'kc_version': u'1.0.0', u'app_pid_dir': > u'${AGENT_WORK_ROOT}/app/run', u'app_container_tag': u'2', > u'pid_file': u'${AGENT_WORK_ROOT}/app/run/kc.pid', u'app_install_dir': > u'${AGENT_WORK_ROOT}/app/install', u'app_user': u'sarthakk', > u'app_input_conf_dir': u'${AGENT_WORK_ROOT}/propagatedconf'}, > u'server': {}}} > INFO 2016-07-07 10:57:36,461 CustomServiceOrchestrator.py:114 - Base > dir: > /private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/container_1467829690678_0022_01_000003/app/definition/package > > For some reason, the STOP command attempts to pick up the script from > the container specific location, where the STATUS command goes through > an entirely different path (I am not sure though if this is the cause > of the issue). Any more pointers to debug this would be really > helpful. > > (For reference, platform: OS X, Python 2.7.11) > > Thank you > Sarthak >