Re:RE: Re:RE: Re:RE: Can't get output after creating measure

Lionel Liu Sun, 15 Apr 2018 03:58:34 -0700

Hi Ashwini,

For livy.conf, you can reference to this:
https://github.com/bhlx3lyx7/griffin-docker/blob/master/griffin_env/conf/livy/livy.conf
It's for livy 0.3, a little different from 0.4, but I think you can find the 
mapping parameters. After modify livy.conf, restart the livy service.

In livy.log I can not get any information, it is the API request log of livy 
service, if you start livy service like this: 
nohup livy-server > livy.log &

Then you can get more information in this livy.log, which contains all the 
parameters when submit job to livy.

In griffin.log, the job submitted seems good, the only concern is how's your 
table schema like, does it really have the partition of "dt" and "hour"? If 
not, you can ignore the "where clause", to calculate the whole table.

Thanks,
Lionel

At 2018-04-15 16:45:50, "Ashwini Kumar Gupta" <[email protected]> wrote:

Hi Lionel,

I’m not really sure about the answer to the questions you have asked but Ill 
tell you the steps and pretty sure you’ll be able to figure out the answers.

Downloaded livy version 0.4.0  zip – Unzipped it .
Changed conf/livy.conf.template to conf/livy.conf
Executed bin/livy-server.

Attached are griffin and livy latest logs . I’m also attaching my livy conf 
file.

Regards

Ashwini

From:[email protected] <[email protected]> On Behalf Of Lionel Liu
Sent: 15 April 2018 14:48
To: Ashwini Kumar Gupta <[email protected]>
Cc:[email protected]
Subject: Re:RE: Re:RE: Can't get output after creating measure

Hi Ashwini,

I’ve read the griffin_log.txt you’ve attached, according to this:

org.apache.griffin.core.util.FSUtil      : Setting 
fs.defaultFS:hdfs://hdfs-default-name

did you set fs.defaultFS as the correct hdfs name? it should be like 
“hdfs://quickstart.cloudera:8020”, just the same as it was set in core-site.xml 
in Hadoop configure directory.

and according to this:

"path" : "dt=20180415 AND hour=06/_DONE"

Did you input the done file path as a “where clause”? it should be like 
“dt=20180415/hour=06/_DONE”, the relative path of the “root.path”.

Actually, you can also IGNORE the “done” file configuration if you don’t have 
any done file to check before calculation, then griffin will submit the job 
directly every time.

In your later email, I think you’ve succeed to submit jobs to livy, but it also 
fails.

Would you pls send me the livy.log?

Where did you deploy livy, can it access your HDFS by “hdfs://<path>” directly? 
Or you need to access your HDFS like “hdfs:/// 
quickstart.cloudera:8020/<path>”? If so, you need to give the full path in the 
configuration, to let livy access the hdfs path.

BTW, the datanucleus jars are from hive library, not from livy, you’ve done the 
right thing.

Thanks
Lionel, Liu

At 2018-04-15 13:47:08, "Ashwini Kumar Gupta" <[email protected]> wrote:

Hi Lionel,

I think in the previous mail people can’t see the image so I’m attaching it.

Also I don’t see measure name when I click on DQ Matrix and My 
Dashboards.Please see atched.

Regards

Ashwini

From: Ashwini Kumar Gupta <[email protected]>
Sent: 15 April 2018 12:07
To:[email protected]; Lionel Liu <[email protected]>
Subject: RE: Re:RE: Can't get output after creating measure

Update:

Griffin was not able to reach out to HDFS because while I was creating DONE 
file in measure I gave path as /user/warehouse . How ever I should have fully 
qualified name. Well I think this was the issue.

Now that I have corrected it I do not get the “can’t reach hdfs error” but I’m 
still not getting the output.

Livy dashboard:

Livy log file:

Warning: Skip remote jar hdfs:///griffin/griffin-measure.jar.

Warning: Skip remote jar hdfs:///livy/datanucleus-api-jdo-3.2.6.jar.

Warning: Skip remote jar hdfs:///livy/datanucleus-core-3.2.10.jar.

Warning: Skip remote jar hdfs:///livy/datanucleus-rdbms-3.2.9.jar.

java.lang.ClassNotFoundException: org.apache.griffin.measure.Application

                at java.net.URLClassLoader.findClass(URLClassLoader.java:381)

                at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

                at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

                at java.lang.Class.forName0(Native Method)

                at java.lang.Class.forName(Class.java:348)

                at org.apache.spark.util.Utils$.classForName(Utils.scala:176)

                at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:689)

                at 
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)

                at 
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)

                at 
org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)

                at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

NOTE: I have placed in griffin-measure in hdfs:///griffin/griffin-measure.jar.

And datanucleus files in hdfs:///livy/datanucleus-api-jdo-3.2.6.jar

Also I have copied datanucleus jar files from Hive library folder. Livy 0.4.0 
doesn’t have these jars files.

Please suggest corrections

Regards

Ashwini

From: Ashwini Kumar Gupta <[email protected]>
Sent: 15 April 2018 11:41
To: Lionel Liu <[email protected]>
Cc:[email protected]
Subject: RE: Re:RE: Can't get output after creating measure

Hello Lionel,

As suggested I kept only one configuration i.e

                                sparkJob.jars = 
hdfs:///livy/datanucleus-api-jdo-3.2.6.jar;\

hdfs:///livy/datanucleus-core-3.2.10.jar;\

hdfs:///livy/datanucleus-rdbms-3.2.9.jar

I kept env.json , griffin-measure.jar ,hive-site.xml  in 
hdfs:///griffin/griffin-measure.jar .

I created an accuracy measure and created a job with 1 minute cron expression.

Attached is the log file. It seems it can’t get to HDFS although my cloudera 
HDFS is up.

Regards

Ashwini

From: Lionel Liu <[email protected]>
Sent: 13 April 2018 15:14
To: Ashwini Kumar Gupta <[email protected]>
Cc:[email protected]
Subject: Re: Re:RE: Can't get output after creating measure

Hi Ashwini,

I've read your document, and here lists my answers:

Question

·         Do I keep both of them?

You should only keep the effective “sparkJob.jars” parameter.

·         Do I have to copy hive-site.xml to HDFS and give the HDFS path in 
spark.yarn.dist.files?

You’d better copy hive-site.xml to HDFS, cause livy could only submit spark 
applications in cluster mode, so the hive-site.xml should be accessed by each 
node.

About the livy log: 

According to livy log, it seems that your configuration of sparkJob.properties 
doesn’t work, livy is trying to find hdfs:///griffin/griffin-measure.jar, not 
hdfs:///user/griffin/griffin-measure.jar.

Pls correct the sparkJob.properties, and rebuild the service module and have a 
try.

Thanks,

Lionel

On Fri, Apr 13, 2018 at 4:16 PM, Ashwini Kumar Gupta 
<[email protected]> wrote:

Hello Lionel,

Apologies for delayed reply. I was trying all my options before raising an 
issue.

I’m attaching my installation steps. Please let me know what’s wrong with them.

Regards

Ashwin

From:[email protected] <[email protected]> On Behalf Of Lionel Liu
Sent: 10 April 2018 18:11
To:[email protected]; Ashwini Kumar Gupta 
<[email protected]>
Subject: Re:RE: Can't get output after creating measure

Hi Ashwini,

It works the same in linux OS, we need to check the log to figure out what 
happened, it might be some configure mistake or input mistake.

I recommend you try our docker image first, by following this doc: 
https://github.com/apache/incubator-griffin/blob/master/griffin-doc/docker/griffin-docker-guide.md

--

Regards,

Lionel, Liu

At 2018-04-10 19:25:01, "Ashwini Kumar Gupta" <[email protected]> wrote:
>Hello Lionel,
> 
>I’m running this in cloudera VM. Will that change anything?
> 
>Regards
>Ashwin
> 
>From: Lionel Liu <[email protected]>
>Sent: 10 April 2018 15:26
>To: [email protected]; Ashwini Kumar Gupta 
><[email protected]>
>Subject: Re: Can't get output after creating measure
> 
>Hi Ashwini,
> 
>First, you could check the log of griffin service, to know if it has triggered 
>the job instance.
>Then, griffin service will submit a spark application with configuration to 
>livy, you can check the log of livy, to verify if it has been submitted 
>correctly.
>After that, you need to check the spark cluster, to verify the application has 
>been accepted by the cluster, if it runs, you can get the application log 
>through yarn.
> 
>Each step error might block the result.
> 
>Thanks,
>Lionel
> 
>On Tue, Apr 10, 2018 at 5:01 PM, William Guo 
><[email protected]<mailto:[email protected]>> wrote:
>hi Ashwin,
> 
>Could you show us your log here?
> 
>Thanks,
>William
> 
>On Tue, Apr 10, 2018 at 3:35 PM, Ashwini Kumar Gupta <
>[email protected]<mailto:[email protected]>> wrote:
> 
>> Hello Team,
>> 
>> I have been trying to install and use griffin but I cannot get output when
>> I click on DQ Matrix.
>> 
>> I created a measure, created a job to run.
>> The sequence in which I run all services are:
>> 
>> 
>>   1.  Elasticsearch
>>   2.  Jar file
>> 
>> I also noticed that griffin is not creating mapping in ES.
>> 
>> Can you please tell me where I'm going wrong.
>> 
>> Thanks
>> Ashwin
>> 
>

Re:RE: Re:RE: Re:RE: Can't get output after creating measure

Reply via email to