Which version of python is running on your hosts? The code is simply trying to execute ‘python —version’, and it appears that the python version you’re using indicates that is doesn’t support that option. Can you try the same call form a shell on the AM host?
— Jon On Sep 11, 2014, at 9:54 AM, 牛兆捷 <nzjem...@gmail.com> wrote: > *The YARN only accept one application and the AppMaster container fails > very soon once it starts.* > > *The error log of AM container:* > > 14/09/11 21:07:24 WARN util.NativeCodeLoader: Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 14/09/11 21:07:25 INFO appmaster.SliderAppMaster: Login user is hustnn > (auth:SIMPLE) > 14/09/11 21:07:25 WARN appmaster.SliderAppMaster: > 14/09/11 21:07:25 INFO appmaster.SliderAppMaster: > 14/09/11 21:07:25 INFO appmaster.SliderAppMaster: OpenSSL 0.9.8e-fips-rhel5 > 01 Jul 2008 > 14/09/11 21:07:25 WARN appmaster.SliderAppMaster: > 14/09/11 21:07:25 WARN appmaster.SliderAppMaster: Unknown option: -- > 14/09/11 21:07:25 WARN appmaster.SliderAppMaster: usage: python [option] > ... [-c cmd | -m mod | file | -] [arg] ... > 14/09/11 21:07:25 WARN appmaster.SliderAppMaster: Try `python -h' for more > information. > 14/09/11 21:07:25 INFO appmaster.SliderAppMaster: > 14/09/11 21:07:25 INFO service.AbstractService: Service python failed in > state STARTED; cause: org.apache.slider.core.main.ServiceLaunchException: > python failed with code 2 > org.apache.slider.core.main.ServiceLaunchException: python failed with code > 2 > at > org.apache.slider.server.services.workflow.ForkedProcessService.reportFailure(ForkedProcessService.java:202) > at > org.apache.slider.server.services.workflow.ForkedProcessService.onProcessExited(ForkedProcessService.java:192) > at > org.apache.slider.server.services.workflow.LongLivedProcess.run(LongLivedProcess.java:345) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > 14/09/11 21:07:25 WARN tools.SliderUtils: Expected exit code={0}, actual > exit code={2} > 14/09/11 21:07:25 INFO tools.SliderUtils: [ERR] > 14/09/11 21:07:25 INFO tools.SliderUtils: [ERR] Unknown option: -- > 14/09/11 21:07:25 INFO tools.SliderUtils: [ERR] usage: python [option] ... > [-c cmd | -m mod | file | -] [arg] ... > 14/09/11 21:07:25 INFO tools.SliderUtils: [ERR] Try `python -h' for more > information. > 14/09/11 21:07:25 INFO tools.SliderUtils: [OUT] > 14/09/11 21:07:25 INFO service.AbstractService: Service SliderAppMaster > failed in state INITED; cause: > org.apache.slider.core.exceptions.SliderException: Process python failed: > Expected exit code={0}, actual exit code={2} > org.apache.slider.core.exceptions.SliderException: Process python failed: > Expected exit code={0}, actual exit code={2} > at > org.apache.slider.common.tools.SliderUtils.execCommand(SliderUtils.java:1744) > at > org.apache.slider.common.tools.SliderUtils.validateSliderServerEnvironment(SliderUtils.java:1777) > at > org.apache.slider.server.appmaster.SliderAppMaster.serviceInit(SliderAppMaster.java:405) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher.java:180) > at > org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly(ServiceLauncher.java:471) > at > org.apache.slider.core.main.ServiceLauncher.launchServiceAndExit(ServiceLauncher.java:401) > at > org.apache.slider.core.main.ServiceLauncher.serviceMain(ServiceLauncher.java:626) > at > org.apache.slider.server.appmaster.SliderAppMaster.main(SliderAppMaster.java:1897) > Exception: org.apache.slider.core.exceptions.SliderException: Process > python failed: Expected exit code={0}, actual exit code={2} > 14/09/11 21:07:25 ERROR main.ServiceLauncher: Exception: > org.apache.slider.core.exceptions.SliderException: Process python failed: > Expected exit code={0}, actual exit code={2} > org.apache.hadoop.service.ServiceStateException: > org.apache.slider.core.exceptions.SliderException: Process python failed: > Expected exit code={0}, actual exit code={2} > at > org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) > at > org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher.java:180) > at > org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly(ServiceLauncher.java:471) > at > org.apache.slider.core.main.ServiceLauncher.launchServiceAndExit(ServiceLauncher.java:401) > at > org.apache.slider.core.main.ServiceLauncher.serviceMain(ServiceLauncher.java:626) > at > org.apache.slider.server.appmaster.SliderAppMaster.main(SliderAppMaster.java:1897) > Caused by: org.apache.slider.core.exceptions.SliderException: Process > python failed: Expected exit code={0}, actual exit code={2} > at > org.apache.slider.common.tools.SliderUtils.execCommand(SliderUtils.java:1744) > at > org.apache.slider.common.tools.SliderUtils.validateSliderServerEnvironment(SliderUtils.java:1777) > at > org.apache.slider.server.appmaster.SliderAppMaster.serviceInit(SliderAppMaster.java:405) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ... 5 more > 14/09/11 21:07:25 INFO util.ExitUtil: Exiting with status 32 > > 2014-09-11 21:43 GMT+08:00 Sumit Mohanty <smoha...@hortonworks.com>: > >> Per the error, the slider-agent failed to start. Can you look at the >> application log folder for log files agent.log and agent.out? >> >> The path is the value of "yarn.nodemanager.log-dirs" in yarn-site.xml. The >> container logs are sub-folders within. >> >> "yarn.nodemanager.delete.debug-delay-sec" should be a non-zero value to >> ensure that logs stay after application is stopped/failed. Otherwise, you >> can use "yarn logs" commands to extract logs from YARN. >> >> Also, if you are using "develop" branch then by default you do not need to >> specify agent.ini. >> >> Another log file to look at is slider-err.txt. This is the log from the >> AppMaster. >> >> >> On Thu, Sep 11, 2014 at 6:12 AM, 牛兆捷 <nzjem...@gmail.com> wrote: >> >>> *I use the develop branch in the slider repo. My config files are set as >>> below:* >>> >>> *slider-client.xml:* >>> >>> <configuration> >>> <property> >>> <name>yarn.resourcemanager.address</name> >>> <value>155.69.148.21:9032</value> >>> </property> >>> <property> >>> <name>yarn.resourcemanager.scheduler.address</name> >>> <value>155.69.148.21:9030</value> >>> </property> >>> <property> >>> <name>fs.defaultFS</name> >>> <value>hdfs://155.69.148.21:18000</value> >>> </property> >>> <property> >>> <name>slider.zookeeper.quorum</name> >>> <value>155.69.148.21:2181</value> >>> </property> >>> <property> >>> <name>slider.client.resource.origin</name> >>> <value>conf/slider-client.xml</value> >>> <description>This is just for diagnostics</description> >>> </property> >>> <property> >>> <name>yarn.application.classpath</name> >>> >>> >>> >> <value>/hadoop/hadoop-2.5.0/etc/hadoop,/hadoop/hadoop-2.5.0/share/hadoop/common/*,/hadoop/hadoop-2.5.0/share/hadoop/common/lib/*,/hadoop/hadoop-2.5.0/share/hadoop/hdfs/*,/hadoop/hadoop-2.5.0/share/hadoop/hdfs/lib/*,/hadoop/hadoop-2.5.0/share/hadoop/mapreduce/*,/hadoop/hadoop-2.5.0/share/hadoop/mapreduce/lib/*,/hadoop/hadoop-2.5.0/share/hadoop/yarn/*,/hadoop/hadoop-2.5.0/share/hadoop/yarn/lib/*</value> >>> </property> >>> <property> >>> <name>yarn.log-aggregation-enable</name> >>> <value>true</value> >>> </property> >>> >>> <property> >>> <name>slider.yarn.queue</name> >>> <value>default</value> >>> <description>YARN queue for the Application Master</description> >>> </property> >>> >>> *I uploaded agent.ini into HDFS (/user/*username*/agent/conf) and its >>> content is:* >>> >>> [server] >>> hostname=localhost >>> port=8440 >>> secured_port=8441 >>> check_path=/ws/v1/slider/agents/ >>> register_path=/ws/v1/slider/agents/{name}/register >>> heartbeat_path=/ws/v1/slider/agents/{name}/heartbeat >>> [agent] >>> app_pkg_dir=app/definition >>> app_install_dir=app/install >>> app_run_dir=app/run >>> app_dbg_cmd= >>> debug_mode_enabled=true >>> app_task_dir=. >>> app_log_dir=. >>> app_tmp_dir=app/tmp >>> log_dir=. >>> run_dir=infra/run >>> version_file=infra/version >>> log_level=INFO >>> [python] >>> [command] >>> max_retries=2 >>> sleep_between_retries=1 >>> [security] >>> [heartbeat] >>> state_interval=6 >>> log_lines_count=300 >>> >>> *appConfig.json:* >>> >>> { >>> "schema": "http://example.org/specification/v2.0.0", >>> "metadata": { >>> }, >>> "global": { >>> "application.def": "slider-hbase-app-package-0.98.5-hadoop2.zip", >>> "create.default.zookeeper.node": "true", >>> "java_home": >> "/users/staff/hustnn/hadoop-0.23.6/java/jdk1.6.0_39", >>> "system_configs": "core-site", >>> "agent.conf": "/user/hustnn/agent/conf/agent.ini", >>> "site.global.app_user": "yarn", >>> "site.global.app_root": >>> "${AGENT_WORK_ROOT}/app/install/hbase-0.98.5-hadoop2", >>> "site.global.ganglia_server_host": "${NN_HOST}", >>> "site.global.ganglia_server_port": "8667", >>> "site.global.ganglia_server_id": "Application1", >>> "site.global.ganglia_enabled":"true", >>> "site.global.hbase_instance_name": "instancename", >>> "site.global.hbase_root_password": "secret", >>> "site.global.user_group": "hadoop", >>> "site.global.security_enabled": "false", >>> "site.global.monitor_protocol": "http", >>> "site.global.hbase_thrift_port": >> "${HBASE_THRIFT.ALLOCATED_PORT}", >>> "site.global.hbase_thrift2_port": >>> "${HBASE_THRIFT2.ALLOCATED_PORT}", >>> "site.global.hbase_rest_port": "${HBASE_REST.ALLOCATED_PORT}", >>> "site.hbase-env.hbase_master_heapsize": "1024m", >>> "site.hbase-env.hbase_regionserver_heapsize": "1024m", >>> "site.hbase-site.hbase.rootdir": "${DEFAULT_DATA_DIR}", >>> "site.hbase-site.hbase.superuser": "yarn", >>> "site.hbase-site.hbase.tmp.dir": >> "${AGENT_WORK_ROOT}/work/app/tmp", >>> "site.hbase-site.hbase.local.dir": "${hbase.tmp.dir}/local", >>> "site.hbase-site.hbase.zookeeper.quorum": "155.69.148.21:2181", >>> "site.hbase-site.zookeeper.znode.parent": "${DEF_ZK_PATH}", >>> "site.hbase-site.hbase.regionserver.info.port": "0", >>> "site.hbase-site.hbase.master.info.port": >>> "${HBASE_MASTER.ALLOCATED_PORT}", >>> "site.hbase-site.hbase.regionserver.port": "0", >>> "site.hbase-site.hbase.master.port": "0" >>> }, >>> "components": { >>> "slider-appmaster": { >>> "jvm.heapsize": "256M" >>> } >>> } >>> } >>> >>> >>> *resource.json:* >>> >>> { >>> "schema": "http://example.org/specification/v2.0.0", >>> "metadata": { >>> }, >>> "global": { >>> }, >>> "components": { >>> "HBASE_MASTER": { >>> "yarn.role.priority": "1", >>> "yarn.component.instances": "1", >>> "yarn.memory": "256" >>> }, >>> "slider-appmaster": { >>> }, >>> "HBASE_REGIONSERVER": { >>> "yarn.role.priority": "2", >>> "yarn.component.instances": "1", >>> "yarn.memory": "256" >>> }, >>> "HBASE_REST": { >>> "yarn.role.priority": "3", >>> "yarn.component.instances": "1", >>> "yarn.memory": "256" >>> }, >>> "HBASE_THRIFT": { >>> "yarn.role.priority": "4", >>> "yarn.component.instances": "1", >>> "yarn.memory": "256" >>> }, >>> "HBASE_THRIFT2": { >>> "yarn.role.priority": "5", >>> "yarn.component.instances": "1", >>> "yarn.memory": "256" >>> } >>> } >>> } >>> >>> >>> 2014-09-11 20:22 GMT+08:00 Tim Israel <t...@timisrael.com>: >>> >>>> What's your slider-client.xml look like? Also, did you put agent.ini >>> into >>>> the appropriate folder in HDFS? >>>> >>>> Tim >>>> >>>> On Thu, Sep 11, 2014 at 7:25 AM, 牛兆捷 <nzjem...@gmail.com> wrote: >>>> >>>>> When I run the agent like this: >>>>> >>>>> ./slider create cl1 --image >>>>> hdfs://yourNameNodeHost:8020/user/yarn/agent/slider-agent.tar.gz >>>>> --template appConfig.json --resources resources.json >>>>> >>>>> The application is accepted by YARN but failed. Anyone know it?? >>>>> >>>>> The error info in the YARN RM UI is : >>>>> >>>>> Application application_1410335027910_0001 failed 2 times due to AM >>>>> Container for appattempt_1410335027910_0001_000002 exited with >>> exitCode: >>>> 32 >>>>> due to: Exception from container-launch: ExitCodeException >> exitCode=32: >>>>> ExitCodeException exitCode=32: >>>>> at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) >>>>> at org.apache.hadoop.util.Shell.run(Shell.java:455) >>>>> at >>>>> >>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:702) >>>>> at >>>>> >>>>> >>>> >>> >> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195) >>>>> at >>>>> >>>>> >>>> >>> >> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300) >>>>> at >>>>> >>>>> >>>> >>> >> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81) >>>>> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:138) >>>>> at >>>>> >>>>> >>>> >>> >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >>>>> at >>>>> >>>>> >>>> >>> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >>>>> at java.lang.Thread.run(Thread.java:662) >>>>> Container exited with a non-zero exit code 32 >>>>> .Failing this attempt.. Failing the application. >>>>> >>>>> >>>>> Then I go inside the container and get the error info as below: >>>>> >>>>> 14/09/11 19:15:45 INFO service.AbstractService: Service python failed >>> in >>>>> state STARTED; cause: >>> org.apache.slider.core.main.ServiceLaunchException: >>>>> python failed with code 2 >>>>> org.apache.slider.core.main.ServiceLaunchException: python failed >> with >>>> code >>>>> 2 >>>>> at >>>>> >>>>> >>>> >>> >> org.apache.slider.server.services.workflow.ForkedProcessService.reportFailure(ForkedProcessService.java:202) >>>>> at >>>>> >>>>> >>>> >>> >> org.apache.slider.server.services.workflow.ForkedProcessService.onProcessExited(ForkedProcessService.java:192) >>>>> at >>>>> >>>>> >>>> >>> >> org.apache.slider.server.services.workflow.LongLivedProcess.run(LongLivedProcess.java:345) >>>>> at >>>>> >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) >>>>> at >>>>> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >>>>> at java.util.concurrent.FutureTask.run(FutureTask.java:138) >>>>> at >>>>> >>>>> >>>> >>> >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >>>>> at >>>>> >>>>> >>>> >>> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >>>>> at java.lang.Thread.run(Thread.java:662) >>>>> 14/09/11 19:15:45 WARN tools.SliderUtils: Expected exit code={0}, >>> actual >>>>> exit code={2} >>>>> 14/09/11 19:15:45 INFO tools.SliderUtils: [ERR] >>>>> 14/09/11 19:15:45 INFO tools.SliderUtils: [ERR] Unknown option: -- >>>>> 14/09/11 19:15:45 INFO tools.SliderUtils: [ERR] usage: python >> [option] >>>> ... >>>>> [-c cmd | -m mod | file | -] [arg] ... >>>>> 14/09/11 19:15:45 INFO tools.SliderUtils: [ERR] Try `python -h' for >>> more >>>>> information. >>>>> 14/09/11 19:15:45 INFO tools.SliderUtils: [OUT] >>>>> 14/09/11 19:15:45 INFO service.AbstractService: Service >> SliderAppMaster >>>>> failed in state INITED; cause: >>>>> org.apache.slider.core.exceptions.SliderException: Process python >>> failed: >>>>> Expected exit code={0}, actual exit code={2} >>>>> org.apache.slider.core.exceptions.SliderException: Process python >>> failed: >>>>> Expected exit code={0}, actual exit code={2} >>>>> at >>>>> >>>>> >>>> >>> >> org.apache.slider.common.tools.SliderUtils.execCommand(SliderUtils.java:1744) >>>>> at >>>>> >>>>> >>>> >>> >> org.apache.slider.common.tools.SliderUtils.validateSliderServerEnvironment(SliderUtils.java:1777) >>>>> at >>>>> >>>>> >>>> >>> >> org.apache.slider.server.appmaster.SliderAppMaster.serviceInit(SliderAppMaster.java:405) >>>>> at >>>>> >>> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) >>>>> at >>>>> >>>>> >>>> >>> >> org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher.java:180) >>>>> at >>>>> >>>>> >>>> >>> >> org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly(ServiceLauncher.java:471) >>>>> at >>>>> >>>>> >>>> >>> >> org.apache.slider.core.main.ServiceLauncher.launchServiceAndExit(ServiceLauncher.java:401) >>>>> at >>>>> >>>>> >>>> >>> >> org.apache.slider.core.main.ServiceLauncher.serviceMain(ServiceLauncher.java:626) >>>>> at >>>>> >>>>> >>>> >>> >> org.apache.slider.server.appmaster.SliderAppMaster.main(SliderAppMaster.java:1897) >>>>> Exception: org.apache.slider.core.exceptions.SliderException: Process >>>>> python failed: Expected exit code={0}, actual exit code={2} >>>>> 14/09/11 19:15:45 ERROR main.ServiceLauncher: Exception: >>>>> org.apache.slider.core.exceptions.SliderException: Process python >>> failed: >>>>> Expected exit code={0}, actual exit code={2} >>>>> org.apache.hadoop.service.ServiceStateException: >>>>> org.apache.slider.core.exceptions.SliderException: Process python >>> failed: >>>>> Expected exit code={0}, actual exit code={2} >>>>> at >>>>> >>>>> >>>> >>> >> org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:59) >>>>> at >>>>> >>> org.apache.hadoop.service.AbstractService.init(AbstractService.java:172) >>>>> at >>>>> >>>>> >>>> >>> >> org.apache.slider.core.main.ServiceLauncher.launchService(ServiceLauncher.java:180) >>>>> at >>>>> >>>>> >>>> >>> >> org.apache.slider.core.main.ServiceLauncher.launchServiceRobustly(ServiceLauncher.java:471) >>>>> at >>>>> >>>>> >>>> >>> >> org.apache.slider.core.main.ServiceLauncher.launchServiceAndExit(ServiceLauncher.java:401) >>>>> at >>>>> >>>>> >>>> >>> >> org.apache.slider.core.main.ServiceLauncher.serviceMain(ServiceLauncher.java:626) >>>>> at >>>>> >>>>> >>>> >>> >> org.apache.slider.server.appmaster.SliderAppMaster.main(SliderAppMaster.java:1897) >>>>> Caused by: org.apache.slider.core.exceptions.SliderException: Process >>>>> python failed: Expected exit code={0}, actual exit code={2} >>>>> at >>>>> >>>>> >>>> >>> >> org.apache.slider.common.tools.SliderUtils.execCommand(SliderUtils.java:1744) >>>>> at >>>>> >>>>> >>>> >>> >> org.apache.slider.common.tools.SliderUtils.validateSliderServerEnvironment(SliderUtils.java:1777) >>>>> at >>>>> >>>>> >>>> >>> >> org.apache.slider.server.appmaster.SliderAppMaster.serviceInit(SliderAppMaster.java:405) >>>>> at >>>>> >>> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) >>>>> ... 5 more >>>>> 14/09/11 19:15:45 INFO util.ExitUtil: Exiting with status 32 >>>>> >>>>> >>>>> -- >>>>> *Regards,* >>>>> *Zhaojie* >>>>> >>>> >>> >>> >>> >>> -- >>> *Regards,* >>> *Zhaojie* >>> >> >> -- >> CONFIDENTIALITY NOTICE >> NOTICE: This message is intended for the use of the individual or entity to >> which it is addressed and may contain information that is confidential, >> privileged and exempt from disclosure under applicable law. If the reader >> of this message is not the intended recipient, you are hereby notified that >> any printing, copying, dissemination, distribution, disclosure or >> forwarding of this communication is strictly prohibited. If you have >> received this communication in error, please contact the sender immediately >> and delete it from your system. Thank You. >> > > > > -- > *Regards,* > *Zhaojie* -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.