Hi Xingbo, It seems like spark cluster could not find the config path: java.io.FileNotFoundException: File does not exist: hdfs://node1:9000/user/beyond/.sparkStaging/application_ 1521777452868_0022/__spark_conf__43296858891313890.zip
It might be similar as these problems: https://github.com/spark-jobserver/spark-jobserver/issues/465 https://stackoverflow.com/questions/36753546/spark-job-running-on-yarn-cluster-java-io-filenotfoundexception-file-does-not-e/36776378#36776378 In these cases, they fixed it by setting HADOOP_CONF_DIR or YARN_CONF_DIR. Here we submit spark job in cluster mode, not client mode, so the environment parameter should be set in each node of your cluster. There's a workable environment in our docker image, you can try it by following this: https://github.com/apache/incubator-griffin/blob/master/griffin-doc/docker/griffin-docker-guide.md, and in the container, you can find the difference between your environment. Hope this helps. Thanks, Lionel On Mon, Mar 26, 2018 at 4:05 PM, 张兴博 <[email protected]> wrote: > hi Lionel, > Thanks for your reply. > > *Livy **submission log**:* > 18/03/26 14:18:21 INFO SparkProcessBuilder: Running > '/home/beyond/bdenv/spark-1.6.3/bin/spark-submit' '--deploy-mode' > 'cluster' '--name' 'griffin' '--class' > 'org.apache.griffin.measure.Application' > '--conf' 'spark.executor.instances=2' '--conf' 'spark.executor.memory=1g' > '--conf' 'spark.driver.memory=1g' '--conf' > 'spark.yarn.tags=livy-batch-3-EOSEYyHb' > '--conf' > 'spark.yarn.dist.files=hdfs://node1:9000/griffin/spark_conf/hive-site.xml' > '--conf' 'spark.yarn.submit.waitAppCompletion=false' '--conf' > 'spark.submit.deployMode=cluster' '--conf' 'spark.jars=hdfs://node1:9000/ > livy/spark-avro_2.11-3.1.0.jar,hdfs://node1:9000/livy/ > datanucleus-api-jdo-3.2.6.jar,hdfs://node1:9000/livy/ > datanucleus-core-3.2.10.jar,hdfs://node1:9000/livy/datanucleus-rdbms-3.2.9.jar' > '--conf' 'spark.master=yarn' '--conf' 'spark.executor.cores=1' '--queue' > 'default' 'hdfs://node1:9000/griffin/measure/griffin-measure.jar' > 'hdfs://node1:9000/griffin/env/env.json' '{ > "measure.type" : "griffin", > "id" : 2, > "name" : "test_job1", > "owner" : "test", > "description" : null, > "organization" : null, > "deleted" : false, > "timestamp" : 1521958700669, > "dq.type" : "accuracy", > "process.type" : "batch", > "data.sources" : [ { > "id" : 3, > "name" : "source", > "connectors" : [ { > "id" : 3, > "name" : "source1522042502364", > "type" : "HIVE", > "version" : "1.2", > "predicates" : [ ], > "data.unit" : "1day", > "config" : { > "database" : "default", > "table.name" : "test" > } > } ] > }, { > "id" : 4, > "name" : "target", > "connectors" : [ { > "id" : 4, > "name" : "target1522042506397", > "type" : "HIVE", > "version" : "1.2", > "predicates" : [ ], > "data.unit" : "1day", > "config" : { > "database" : "default", > "table.name" : "test" > } > } ] > } ], > "evaluate.rule" : { > "id" : 2, > "rules" : [ { > "id" : 2, > "rule" : "source.id=target.id", > "name" : "accuracy", > "dsl.type" : "griffin-dsl", > "dq.type" : "accuracy" > } ] > }, > "measure.type" : "griffin" > }' 'hdfs,raw' > > *yarn log:* > Diagnostics: > Application application_1521777452868_0022 failed 2 times due to AM > Container for appattempt_1521777452868_0022_000002 exited with exitCode: > -1000 > For more detailed output, check application tracking page: > http://node1:8088/proxy/application_1521777452868_0022/Then, click on > links to logs of each attempt. > Diagnostics: File does not exist: hdfs://node1:9000/user/beyond/ > .sparkStaging/application_1521777452868_0022/__spark_ > conf__43296858891313890.zip > java.io.FileNotFoundException: File does not exist: > hdfs://node1:9000/user/beyond/.sparkStaging/application_ > 1521777452868_0022/__spark_conf__43296858891313890.zip > at org.apache.hadoop.hdfs.DistributedFileSystem$18. > doCall(DistributedFileSystem.java:1122) > at org.apache.hadoop.hdfs.DistributedFileSystem$18. > doCall(DistributedFileSystem.java:1114) > at org.apache.hadoop.fs.FileSystemLinkResolver.resolve( > FileSystemLinkResolver.java:81) > at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus( > DistributedFileSystem.java:1114) > at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:251) > at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61) > at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359) > at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at org.apache.hadoop.security.UserGroupInformation.doAs( > UserGroupInformation.java:1692) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1149) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Failing this attempt. Failing the application. > > Thanks again. > > > At 2018-03-26 15:54:28, "Lionel Liu" <[email protected]> wrote: > > Hi 张兴博, > > I've got your questions, and I need to know more about your error. > Griffin service will submit job to livy, livy then submit it to yarn. To > your condition, I think it has submitted to livy, and error comes when livy > submit to yarn. > Would you pls show me the submission log of livy, which shows the > parameters livy submit to yarn exactly? > Or if you can see the application information on spark cluster ui, you may > get the application id, would you deep dive into the yarn log by > application id, and show me the error part? > > > Thanks, > Lionel > > > 2018-03-26 15:18 GMT+08:00 William Guo <[email protected]>: > >> Forward question to dev list. >> >> >> >> >> ---------- Forwarded message ---------- >> From: 张兴博 <[email protected]> >> Date: 2018-03-26 15:07 GMT+08:00 >> Subject: 您好,我想请问一下griffin的问题 >> To: [email protected] >> >> >> 您好, >> 我想请问一下girffin的部署问题,最近看到griffin的项目,想部署上试试,打算部署在我们学校供学生使用,但是我按照README. >> md部署上griffin后,griffin提交的spark任务都是失败。 >> 首先我的hadoop2.6.5,hive1.2.2,spark1.6.3环境肯定没有问题。livy也单独用简单的命令进行测试也没有问题, >> 但是griffin提交任务就会在yarn的日志中报错。 >> 2018-03-19 19:14:30,624 INFO org.apache.hadoop.yarn.server. >> resourcemanager.rmapp.attempt.RMAppAttemptImpl: Updating application >> attempt appattempt_1521443096534_0050_000001 with final state: FAILED, >> and >> exit status: 254 >> 这是第一次出现FAILED的地方,主要现在我没有办法定位错误 >> > > > > >
