Hi Xingbo,

It seems like spark cluster could not find the config path:
java.io.FileNotFoundException: File does not exist:
hdfs://node1:9000/user/beyond/.sparkStaging/application_
1521777452868_0022/__spark_conf__43296858891313890.zip

It might be similar as these problems:
https://github.com/spark-jobserver/spark-jobserver/issues/465
https://stackoverflow.com/questions/36753546/spark-job-running-on-yarn-cluster-java-io-filenotfoundexception-file-does-not-e/36776378#36776378

In these cases, they fixed it by setting HADOOP_CONF_DIR or YARN_CONF_DIR.

Here we submit spark job in cluster mode, not client mode, so the
environment parameter should be set in each node of your cluster.

There's a workable environment in our docker image, you can try it by
following this:
https://github.com/apache/incubator-griffin/blob/master/griffin-doc/docker/griffin-docker-guide.md,
and in the container, you can find the difference between your environment.

Hope this helps.

Thanks,
Lionel


On Mon, Mar 26, 2018 at 4:05 PM, 张兴博 <[email protected]> wrote:

> hi Lionel,
> Thanks for your reply.
>
> *Livy **submission log**:*
> 18/03/26 14:18:21 INFO SparkProcessBuilder: Running
> '/home/beyond/bdenv/spark-1.6.3/bin/spark-submit' '--deploy-mode'
> 'cluster' '--name' 'griffin' '--class' 
> 'org.apache.griffin.measure.Application'
> '--conf' 'spark.executor.instances=2' '--conf' 'spark.executor.memory=1g'
> '--conf' 'spark.driver.memory=1g' '--conf' 
> 'spark.yarn.tags=livy-batch-3-EOSEYyHb'
> '--conf' 
> 'spark.yarn.dist.files=hdfs://node1:9000/griffin/spark_conf/hive-site.xml'
> '--conf' 'spark.yarn.submit.waitAppCompletion=false' '--conf'
> 'spark.submit.deployMode=cluster' '--conf' 'spark.jars=hdfs://node1:9000/
> livy/spark-avro_2.11-3.1.0.jar,hdfs://node1:9000/livy/
> datanucleus-api-jdo-3.2.6.jar,hdfs://node1:9000/livy/
> datanucleus-core-3.2.10.jar,hdfs://node1:9000/livy/datanucleus-rdbms-3.2.9.jar'
> '--conf' 'spark.master=yarn' '--conf' 'spark.executor.cores=1' '--queue'
> 'default' 'hdfs://node1:9000/griffin/measure/griffin-measure.jar'
> 'hdfs://node1:9000/griffin/env/env.json' '{
>   "measure.type" : "griffin",
>   "id" : 2,
>   "name" : "test_job1",
>   "owner" : "test",
>   "description" : null,
>   "organization" : null,
>   "deleted" : false,
>   "timestamp" : 1521958700669,
>   "dq.type" : "accuracy",
>   "process.type" : "batch",
>   "data.sources" : [ {
>     "id" : 3,
>     "name" : "source",
>     "connectors" : [ {
>       "id" : 3,
>       "name" : "source1522042502364",
>       "type" : "HIVE",
>       "version" : "1.2",
>       "predicates" : [ ],
>       "data.unit" : "1day",
>       "config" : {
>         "database" : "default",
>         "table.name" : "test"
>       }
>     } ]
>   }, {
>     "id" : 4,
>     "name" : "target",
>     "connectors" : [ {
>       "id" : 4,
>       "name" : "target1522042506397",
>       "type" : "HIVE",
>       "version" : "1.2",
>       "predicates" : [ ],
>       "data.unit" : "1day",
>       "config" : {
>         "database" : "default",
>         "table.name" : "test"
>       }
>     } ]
>   } ],
>   "evaluate.rule" : {
>     "id" : 2,
>     "rules" : [ {
>       "id" : 2,
>       "rule" : "source.id=target.id",
>       "name" : "accuracy",
>       "dsl.type" : "griffin-dsl",
>       "dq.type" : "accuracy"
>     } ]
>   },
>   "measure.type" : "griffin"
> }' 'hdfs,raw'
>
> *yarn log:*
> Diagnostics:
> Application application_1521777452868_0022 failed 2 times due to AM
> Container for appattempt_1521777452868_0022_000002 exited with exitCode:
> -1000
> For more detailed output, check application tracking page:
> http://node1:8088/proxy/application_1521777452868_0022/Then, click on
> links to logs of each attempt.
> Diagnostics: File does not exist: hdfs://node1:9000/user/beyond/
> .sparkStaging/application_1521777452868_0022/__spark_
> conf__43296858891313890.zip
> java.io.FileNotFoundException: File does not exist:
> hdfs://node1:9000/user/beyond/.sparkStaging/application_
> 1521777452868_0022/__spark_conf__43296858891313890.zip
> at org.apache.hadoop.hdfs.DistributedFileSystem$18.
> doCall(DistributedFileSystem.java:1122)
> at org.apache.hadoop.hdfs.DistributedFileSystem$18.
> doCall(DistributedFileSystem.java:1114)
> at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(
> FileSystemLinkResolver.java:81)
> at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(
> DistributedFileSystem.java:1114)
> at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:251)
> at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61)
> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1692)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356)
> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Failing this attempt. Failing the application.
>
> Thanks again.
>
>
> At 2018-03-26 15:54:28, "Lionel Liu" <[email protected]> wrote:
>
> Hi 张兴博,
>
> I've got your questions, and I need to know more about your error.
> Griffin service will submit job to livy, livy then submit it to yarn. To
> your condition, I think it has submitted to livy, and error comes when livy
> submit to yarn.
> Would you pls show me the submission log of livy, which shows the
> parameters livy submit to yarn exactly?
> Or if you can see the application information on spark cluster ui, you may
> get the application id, would you deep dive into the yarn log by
> application id, and show me the error part?
>
>
> Thanks,
> Lionel
>
>
> 2018-03-26 15:18 GMT+08:00 William Guo <[email protected]>:
>
>> Forward question to dev list.
>>
>>
>>
>>
>> ---------- Forwarded message ----------
>> From: 张兴博 <[email protected]>
>> Date: 2018-03-26 15:07 GMT+08:00
>> Subject: 您好,我想请问一下griffin的问题
>> To: [email protected]
>>
>>
>> 您好,
>>     我想请问一下girffin的部署问题,最近看到griffin的项目,想部署上试试,打算部署在我们学校供学生使用,但是我按照README.
>> md部署上griffin后,griffin提交的spark任务都是失败。
>>     首先我的hadoop2.6.5,hive1.2.2,spark1.6.3环境肯定没有问题。livy也单独用简单的命令进行测试也没有问题,
>> 但是griffin提交任务就会在yarn的日志中报错。
>> 2018-03-19 19:14:30,624 INFO org.apache.hadoop.yarn.server.
>> resourcemanager.rmapp.attempt.RMAppAttemptImpl: Updating application
>> attempt appattempt_1521443096534_0050_000001 with final state: FAILED,
>> and
>> exit status: 254
>>     这是第一次出现FAILED的地方,主要现在我没有办法定位错误
>>
>
>
>
>
>

Reply via email to