Hello!

I am trying to use Slider to distribute an application over a YARN
cluster. While attempting to use "slider flex" to decrease the number
of containers allocated for the application (using the kafka
app-package as reference), I came across the following error:

ERROR 2016-07-07 10:57:36,461 CustomServiceOrchestrator.py:169 -
Caught an exception while executing command: <class
'AgentException.AgentException'>: 'Script
/private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/container_1467829690678_0022_01_000003/app/definition/package/scripts/kafka.py
does not exist'
Traceback (most recent call last):
  File 
"/private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/filecache/71/slider-agent.tar.gz/slider-agent/agent/CustomServiceOrchestrator.py",
line 115, in runCommand
    script_path = self.resolve_script_path(self.base_dir, script, script_type)
  File 
"/private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/filecache/71/slider-agent.tar.gz/slider-agent/agent/CustomServiceOrchestrator.py",
line 199, in resolve_script_path
    raise AgentException(message)
AgentException: 'Script
/private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/container_1467829690678_0022_01_000003/app/definition/package/scripts/kafka.py
does not exist'


(Seems like the directory is cleared up before the command)

Additionally, I tried adding debug prints in the
CustomServiceOrchestrator to see what base directory is used for
invoking the script and found that the base directories for STATUS and
STOP command differ:

STATUS command:

INFO 2016-07-07 10:56:31,323 AgentToggleLogger.py:40 - Adding
STATUS_COMMAND for service kc of cluster kc to the queue.
INFO 2016-07-07 10:56:31,327 CustomServiceOrchestrator.py:114 - Base
dir: 
/private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/filecache/113/slider-kafka-package-1.0.0.zip/package


STOP command:

INFO 2016-07-07 10:57:36,455 AgentToggleLogger.py:40 - Adding
EXECUTION_COMMAND for service kc of cluster kc to the queue.
INFO 2016-07-07 10:57:36,456 Controller.py:251 - Attempting to
gracefully stop the application ...
INFO 2016-07-07 10:57:36,458 ActionQueue.py:134 - Package received:
INFO 2016-07-07 10:57:36,458 ActionQueue.py:140 - Executing command
with id = 4-1 for role = Hello of cluster kc
INFO 2016-07-07 10:57:36,460 ActionQueue.py:170 - Running command:
{u'roleCommand': u'STOP', u'clusterName': u'kc', u'componentName':
u'Hello', u'hostname': u'192.168.1.195', u'hostLevelParams':
{u'java_home': 
u'/Library/Java/JavaVirtualMachines/jdk1.7.0_79.jdk/Contents/Home/',
u'container_id': u'container_1467829690678_0022_01_000003'},
u'commandType': u'EXECUTION_COMMAND', u'roleParams': {u'auto_restart':
u'false'}, u'serviceName': u'kc', u'role': u'Hello', u'commandParams':
{u'record_config': u'true', u'service_package_folder':
u'${AGENT_WORK_ROOT}/work/app/definition/package', u'script':
u'scripts/kafka.py', u'schema_version': u'2.0', u'command_timeout':
u'600', u'script_type': u'PYTHON'}, u'taskId': 4, u'commandId':
u'4-1', u'containers': [], u'configurations': {u'global':
{u'security_enabled': u'false', u'app_container_id':
u'container_1467829690678_0022_01_000003', u'listen_port': u'52508',
u'app_root': u'${AGENT_WORK_ROOT}/app/install', u'app_log_dir':
u'${AGENT_LOG_ROOT}', u'kc_version': u'1.0.0', u'app_pid_dir':
u'${AGENT_WORK_ROOT}/app/run', u'app_container_tag': u'2',
u'pid_file': u'${AGENT_WORK_ROOT}/app/run/kc.pid', u'app_install_dir':
u'${AGENT_WORK_ROOT}/app/install', u'app_user': u'sarthakk',
u'app_input_conf_dir': u'${AGENT_WORK_ROOT}/propagatedconf'},
u'server': {}}}
INFO 2016-07-07 10:57:36,461 CustomServiceOrchestrator.py:114 - Base
dir: 
/private/tmp/hdfs/nm-local-dir/usercache/sarthakk/appcache/application_1467829690678_0022/container_1467829690678_0022_01_000003/app/definition/package

For some reason, the STOP command attempts to pick up the script from
the container specific location, where the STATUS command goes through
an entirely different path (I am not sure though if this is the cause
of the issue). Any more pointers to debug this would be really
helpful.

(For reference, platform: OS X, Python 2.7.11)

Thank you
Sarthak

Reply via email to