Re: Re: Re: 您好，我想请问一下griffin的问题

Lionel Liu Fri, 30 Mar 2018 02:30:02 -0700

Hi Xingbo,

1. sparkJobs.properties is in service module,
service/src/main/resources/sparkJobs.properties,
link here:
https://github.com/apache/incubator-griffin/blob/master/service/src/main/resources/sparkJob.properties


2. You need to set your own environment parameters in sparkJobs.properties,
which will be submitted to livy, to let it know the resources on your hdfs.
BTW, if you modify this file, you need to rebuild the project to get your
fresh service.jar.

3. In sparkJobs.properties, I suggest you fill the full HDFS path head with
cluster name exactly the same with "fs.defaultFS" value in
$HADOOP_HOME/etc/hadoop/core-site.xml, like this:
"hdfs://griffin-cluster/<path to>/<file name>", to let spark can be able to
access it.

Thanks,
Lionel

On Fri, Mar 30, 2018 at 2:09 PM, 张兴博 <[email protected]> wrote:

> Hi Lionel,
> I didn't find 'sparkJobs.properties'.
> Can you show me this file?
> I don't know how to write the HDFS path in the 'sparkjobs.properties'?
> Does the HDFS path need to write hostname and port?
> Thanks.
>
>
>
>
>
> At 2018-03-26 16:41:06, "Lionel Liu" <[email protected]> wrote:
>
> Hi Xingbo,
>
> It seems like spark cluster could not find the config path:
> java.io.FileNotFoundException: File does not exist:
> hdfs://node1:9000/user/beyond/.sparkStaging/application_1521
> 777452868_0022/__spark_conf__43296858891313890.zip
>
> It might be similar as these problems:
> https://github.com/spark-jobserver/spark-jobserver/issues/465
> https://stackoverflow.com/questions/36753546/spark-job-
> running-on-yarn-cluster-java-io-filenotfoundexception-file-
> does-not-e/36776378#36776378
>
> In these cases, they fixed it by setting HADOOP_CONF_DIR or YARN_CONF_DIR.
>
> Here we submit spark job in cluster mode, not client mode, so the
> environment parameter should be set in each node of your cluster.
>
> There's a workable environment in our docker image, you can try it by
> following this: https://github.com/apache/incubator-griffin/blob/
> master/griffin-doc/docker/griffin-docker-guide.md, and in the container,
> you can find the difference between your environment.
>
> Hope this helps.
>
> Thanks,
> Lionel
>
>
> On Mon, Mar 26, 2018 at 4:05 PM, 张兴博 <[email protected]> wrote:
>
>> hi Lionel，
>> Thanks for your reply.
>>
>> *Livy **submission log**:*
>> 18/03/26 14:18:21 INFO SparkProcessBuilder: Running
>> '/home/beyond/bdenv/spark-1.6.3/bin/spark-submit' '--deploy-mode'
>> 'cluster' '--name' 'griffin' '--class' 
>> 'org.apache.griffin.measure.Application'
>> '--conf' 'spark.executor.instances=2' '--conf' 'spark.executor.memory=1g'
>> '--conf' 'spark.driver.memory=1g' '--conf' 
>> 'spark.yarn.tags=livy-batch-3-EOSEYyHb'
>> '--conf' 
>> 'spark.yarn.dist.files=hdfs://node1:9000/griffin/spark_conf/hive-site.xml'
>> '--conf' 'spark.yarn.submit.waitAppCompletion=false' '--conf'
>> 'spark.submit.deployMode=cluster' '--conf' 'spark.jars=hdfs://node1:9000/
>> livy/spark-avro_2.11-3.1.0.jar,hdfs://node1:9000/livy/datanu
>> cleus-api-jdo-3.2.6.jar,hdfs://node1:9000/livy/datanucleus-
>> core-3.2.10.jar,hdfs://node1:9000/livy/datanucleus-rdbms-3.2.9.jar'
>> '--conf' 'spark.master=yarn' '--conf' 'spark.executor.cores=1' '--queue'
>> 'default' 'hdfs://node1:9000/griffin/measure/griffin-measure.jar'
>> 'hdfs://node1:9000/griffin/env/env.json' '{
>>   "measure.type" : "griffin",
>>   "id" : 2,
>>   "name" : "test_job1",
>>   "owner" : "test",
>>   "description" : null,
>>   "organization" : null,
>>   "deleted" : false,
>>   "timestamp" : 1521958700669,
>>   "dq.type" : "accuracy",
>>   "process.type" : "batch",
>>   "data.sources" : [ {
>>     "id" : 3,
>>     "name" : "source",
>>     "connectors" : [ {
>>       "id" : 3,
>>       "name" : "source1522042502364",
>>       "type" : "HIVE",
>>       "version" : "1.2",
>>       "predicates" : [ ],
>>       "data.unit" : "1day",
>>       "config" : {
>>         "database" : "default",
>>         "table.name" : "test"
>>       }
>>     } ]
>>   }, {
>>     "id" : 4,
>>     "name" : "target",
>>     "connectors" : [ {
>>       "id" : 4,
>>       "name" : "target1522042506397",
>>       "type" : "HIVE",
>>       "version" : "1.2",
>>       "predicates" : [ ],
>>       "data.unit" : "1day",
>>       "config" : {
>>         "database" : "default",
>>         "table.name" : "test"
>>       }
>>     } ]
>>   } ],
>>   "evaluate.rule" : {
>>     "id" : 2,
>>     "rules" : [ {
>>       "id" : 2,
>>       "rule" : "source.id=target.id",
>>       "name" : "accuracy",
>>       "dsl.type" : "griffin-dsl",
>>       "dq.type" : "accuracy"
>>     } ]
>>   },
>>   "measure.type" : "griffin"
>> }' 'hdfs,raw'
>>
>> *yarn log:*
>> Diagnostics:
>> Application application_1521777452868_0022 failed 2 times due to AM
>> Container for appattempt_1521777452868_0022_000002 exited with exitCode:
>> -1000
>> For more detailed output, check application tracking page:
>> http://node1:8088/proxy/application_1521777452868_0022/Then, click on
>> links to logs of each attempt.
>> Diagnostics: File does not exist: hdfs://node1:9000/user/beyond/
>> .sparkStaging/application_1521777452868_0022/__spark_conf__
>> 43296858891313890.zip
>> java.io.FileNotFoundException: File does not exist:
>> hdfs://node1:9000/user/beyond/.sparkStaging/application_1521
>> 777452868_0022/__spark_conf__43296858891313890.zip
>> at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(
>> DistributedFileSystem.java:1122)
>> at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(
>> DistributedFileSystem.java:1114)
>> at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSyst
>> emLinkResolver.java:81)
>> at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(D
>> istributedFileSystem.java:1114)
>> at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:251)
>> at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:61)
>> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
>> at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:357)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:422)
>> at org.apache.hadoop.security.UserGroupInformation.doAs(UserGro
>> upInformation.java:1692)
>> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:356)
>> at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:60)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at java.util.concurrent.Executors$RunnableAdapter.call(
>> Executors.java:511)
>> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1149)
>> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:624)
>> at java.lang.Thread.run(Thread.java:748)
>> Failing this attempt. Failing the application.
>>
>> Thanks again.
>>
>>
>> At 2018-03-26 15:54:28, "Lionel Liu" <[email protected]> wrote:
>>
>> Hi 张兴博,
>>
>> I've got your questions, and I need to know more about your error.
>> Griffin service will submit job to livy, livy then submit it to yarn. To
>> your condition, I think it has submitted to livy, and error comes when livy
>> submit to yarn.
>> Would you pls show me the submission log of livy, which shows the
>> parameters livy submit to yarn exactly?
>> Or if you can see the application information on spark cluster ui, you
>> may get the application id, would you deep dive into the yarn log by
>> application id, and show me the error part?
>>
>>
>> Thanks,
>> Lionel
>>
>>
>> 2018-03-26 15:18 GMT+08:00 William Guo <[email protected]>:
>>
>>> Forward question to dev list.
>>>
>>>
>>>
>>>
>>> ---------- Forwarded message ----------
>>> From: 张兴博 <[email protected]>
>>> Date: 2018-03-26 15:07 GMT+08:00
>>> Subject: 您好，我想请问一下griffin的问题
>>> To: [email protected]
>>>
>>>
>>> 您好，
>>>     我想请问一下girffin的部署问题，最近看到griffin的项目，想部署上试试，打算部署在我们学校供学生使用，但是我按照README.
>>> md部署上griffin后，griffin提交的spark任务都是失败。
>>>     首先我的hadoop2.6.5，hive1.2.2，spark1.6.3环境肯定没有问题。livy也单独用简单的命令进行测试也没有问题，
>>> 但是griffin提交任务就会在yarn的日志中报错。
>>> 2018-03-19 19:14:30,624 INFO org.apache.hadoop.yarn.server.
>>> resourcemanager.rmapp.attempt.RMAppAttemptImpl: Updating application
>>> attempt appattempt_1521443096534_0050_000001 with final state: FAILED,
>>> and
>>> exit status: 254
>>>     这是第一次出现FAILED的地方，主要现在我没有办法定位错误
>>>
>>
>>
>>
>>
>>
>
>
>
>
>

Re: Re: Re: 您好，我想请问一下griffin的问题

Reply via email to