I think that
/private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/container_1467829690678_0022_01_000003/app/definition
is linked to
/private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/filecache/113/slider-kafka-package-1.0.0.zip,
so those are actually the same directory. I am not sure why it is saying
kafka.py does not exist when stopping the container; it definitely should
not clean up that directory while a container is still running. Can you
verify that app/definition/package/scripts/kafka.py exists for one of the
containers that is running?

On Thu, Jul 7, 2016 at 11:50 AM, Sarthak Kukreti <skuk...@ncsu.edu> wrote:

> Hello!
>
> I am trying to use Slider to distribute an application over a YARN
> cluster. While attempting to use "slider flex" to decrease the number
> of containers allocated for the application (using the kafka
> app-package as reference), I came across the following error:
>
> ERROR 2016-07-07 10:57:36,461 CustomServiceOrchestrator.py:169 -
> Caught an exception while executing command: <class
> 'AgentException.AgentException'>: 'Script
>
> /private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/container_1467829690678_0022_01_000003/app/definition/package/scripts/kafka.py
> does not exist'
> Traceback (most recent call last):
>   File
> "/private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/filecache/71/slider-agent.tar.gz/slider-agent/agent/CustomServiceOrchestrator.py",
> line 115, in runCommand
>     script_path = self.resolve_script_path(self.base_dir, script,
> script_type)
>   File
> "/private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/filecache/71/slider-agent.tar.gz/slider-agent/agent/CustomServiceOrchestrator.py",
> line 199, in resolve_script_path
>     raise AgentException(message)
> AgentException: 'Script
>
> /private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/container_1467829690678_0022_01_000003/app/definition/package/scripts/kafka.py
> does not exist'
>
>
> (Seems like the directory is cleared up before the command)
>
> Additionally, I tried adding debug prints in the
> CustomServiceOrchestrator to see what base directory is used for
> invoking the script and found that the base directories for STATUS and
> STOP command differ:
>
> STATUS command:
>
> INFO 2016-07-07 10:56:31,323 AgentToggleLogger.py:40 - Adding
> STATUS_COMMAND for service kc of cluster kc to the queue.
> INFO 2016-07-07 10:56:31,327 CustomServiceOrchestrator.py:114 - Base
> dir:
> /private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/filecache/113/slider-kafka-package-1.0.0.zip/package
>
>
> STOP command:
>
> INFO 2016-07-07 10:57:36,455 AgentToggleLogger.py:40 - Adding
> EXECUTION_COMMAND for service kc of cluster kc to the queue.
> INFO 2016-07-07 10:57:36,456 Controller.py:251 - Attempting to
> gracefully stop the application ...
> INFO 2016-07-07 10:57:36,458 ActionQueue.py:134 - Package received:
> INFO 2016-07-07 10:57:36,458 ActionQueue.py:140 - Executing command
> with id = 4-1 for role = Hello of cluster kc
> INFO 2016-07-07 10:57:36,460 ActionQueue.py:170 - Running command:
> {u'roleCommand': u'STOP', u'clusterName': u'kc', u'componentName':
> u'Hello', u'hostname': u'192.168.1.195', u'hostLevelParams':
> {u'java_home':
> u'/Library/Java/JavaVirtualMachines/jdk1.7.0_79.jdk/Contents/Home/',
> u'container_id': u'container_1467829690678_0022_01_000003'},
> u'commandType': u'EXECUTION_COMMAND', u'roleParams': {u'auto_restart':
> u'false'}, u'serviceName': u'kc', u'role': u'Hello', u'commandParams':
> {u'record_config': u'true', u'service_package_folder':
> u'${AGENT_WORK_ROOT}/work/app/definition/package', u'script':
> u'scripts/kafka.py', u'schema_version': u'2.0', u'command_timeout':
> u'600', u'script_type': u'PYTHON'}, u'taskId': 4, u'commandId':
> u'4-1', u'containers': [], u'configurations': {u'global':
> {u'security_enabled': u'false', u'app_container_id':
> u'container_1467829690678_0022_01_000003', u'listen_port': u'52508',
> u'app_root': u'${AGENT_WORK_ROOT}/app/install', u'app_log_dir':
> u'${AGENT_LOG_ROOT}', u'kc_version': u'1.0.0', u'app_pid_dir':
> u'${AGENT_WORK_ROOT}/app/run', u'app_container_tag': u'2',
> u'pid_file': u'${AGENT_WORK_ROOT}/app/run/kc.pid', u'app_install_dir':
> u'${AGENT_WORK_ROOT}/app/install', u'app_user': u'sarthakk',
> u'app_input_conf_dir': u'${AGENT_WORK_ROOT}/propagatedconf'},
> u'server': {}}}
> INFO 2016-07-07 10:57:36,461 CustomServiceOrchestrator.py:114 - Base
> dir:
> /private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/container_1467829690678_0022_01_000003/app/definition/package
>
> For some reason, the STOP command attempts to pick up the script from
> the container specific location, where the STATUS command goes through
> an entirely different path (I am not sure though if this is the cause
> of the issue). Any more pointers to debug this would be really
> helpful.
>
> (For reference, platform: OS X, Python 2.7.11)
>
> Thank you
> Sarthak
>

Reply via email to