Re: Griffin Integration with Cloudera Hadoop POC

William Guo Fri, 11 Sep 2020 06:00:59 -0700

check this
https://spark.apache.org/docs/2.2.1/monitoring.html




On Fri, Sep 11, 2020 at 8:58 PM William Guo <[email protected]> wrote:

> for griffin log, please search in your spark cluster env, usually in
> worker log dir.
> One weird thing is how you submit a job to spark, if you disabled livy?
>
> On Fri, Sep 11, 2020 at 8:46 PM Sunil Muniyal <[email protected]>
> wrote:
>
>> possible to please help the location where measure log would get created
>> or from where can i check the location?
>>
>> Thanks and Regards,
>> Sunil Muniyal
>>
>>
>> On Fri, Sep 11, 2020 at 6:14 PM William Guo <[email protected]> wrote:
>>
>>> Livy is used to post jobs to your cluster, I don't think it is related
>>> to livy.
>>>
>>> Could you also share the measure log in your cluster?
>>>
>>>
>>> On Fri, Sep 11, 2020 at 8:03 PM Sunil Muniyal <
>>> [email protected]> wrote:
>>>
>>>> Got below message as output of
>>>>
>>>> {"Test_Measure":[{"name":"Test_Job","type":"ACCURACY","owner":"test","metricValues":[]}]}
>>>>
>>>> metricValues seems empty. So is it like Griffin is not getting data
>>>> from ES? whereas ES does have the data which we verified previously. By any
>>>> chance, do you think not having Livy could be a problem?
>>>>
>>>> These are the latest logs from service.out:
>>>> *[EL Fine]: sql: 2020-09-11
>>>> 11:59:11.662--ServerSession(400064818)--Connection(754936662)--SELECT
>>>> DISTINCT ID, APPID, APPURI, CREATEDDATE, DELETED, expire_timestamp,
>>>> MODIFIEDDATE, predicate_job_deleted, predicate_group_name,
>>>> predicate_job_name, SESSIONID, STATE, timestamp, TYPE, job_id FROM
>>>> JOBINSTANCEBEAN WHERE (STATE IN (?,?,?,?,?,?))*
>>>> *        bind => [6 parameters bound]*
>>>> *[EL Fine]: sql: 2020-09-11
>>>> 11:59:51.044--ServerSession(400064818)--Connection(353930083)--SELECT ID,
>>>> type, CREATEDDATE, CRONEXPRESSION, DELETED, quartz_group_name, JOBNAME,
>>>> MEASUREID, METRICNAME, MODIFIEDDATE, quartz_job_name, PREDICATECONFIG,
>>>> TIMEZONE FROM job WHERE (DELETED = ?)*
>>>> *        bind => [1 parameter bound]*
>>>> *[EL Fine]: sql: 2020-09-11
>>>> 11:59:51.046--ServerSession(400064818)--Connection(1245663749)--SELECT
>>>> DISTINCT DTYPE FROM MEASURE WHERE (DELETED = ?)*
>>>> *        bind => [1 parameter bound]*
>>>> *[EL Fine]: sql: 2020-09-11
>>>> 11:59:51.046--ServerSession(400064818)--Connection(674248356)--SELECT
>>>> t0.ID, t0.DTYPE, t0.CREATEDDATE, t0.DELETED, t0.DESCRIPTION, t0.DQTYPE,
>>>> t0.MODIFIEDDATE, t0.NAME, t0.ORGANIZATION, t0.OWNER, t0.SINKS, t1.ID,
>>>> t1.PROCESSTYPE, t1.RULEDESCRIPTION, t1.evaluate_rule_id FROM MEASURE t0,
>>>> GRIFFINMEASURE t1 WHERE ((t0.DELETED = ?) AND ((t1.ID = t0.ID) AND
>>>> (t0.DTYPE = ?)))*
>>>> *        bind => [2 parameters bound]*
>>>> *[EL Fine]: sql: 2020-09-11
>>>> 12:00:00.019--ClientSession(294162678)--Connection(98503327)--INSERT INTO
>>>> JOBINSTANCEBEAN (ID, APPID, APPURI, CREATEDDATE, DELETED, expire_timestamp,
>>>> MODIFIEDDATE, predicate_job_deleted, predicate_group_name,
>>>> predicate_job_name, SESSIONID, STATE, timestamp, TYPE, job_id) VALUES (?,
>>>> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)*
>>>> *        bind => [15 parameters bound]*
>>>> *[EL Fine]: sql: 2020-09-11
>>>> 12:00:00.09--ServerSession(400064818)--Connection(491395630)--SELECT ID,
>>>> APPID, APPURI, CREATEDDATE, DELETED, expire_timestamp, MODIFIEDDATE,
>>>> predicate_job_deleted, predicate_group_name, predicate_job_name, SESSIONID,
>>>> STATE, timestamp, TYPE, job_id FROM JOBINSTANCEBEAN WHERE
>>>> (predicate_job_name = ?)*
>>>> *        bind => [1 parameter bound]*
>>>> *2020-09-11 12:00:00.117  INFO 10980 --- [ryBean_Worker-3]
>>>> o.a.g.c.j.SparkSubmitJob                 : {*
>>>> *  "measure.type" : "griffin",*
>>>> *  "id" : 201,*
>>>> *  "name" : "Test_Job",*
>>>> *  "owner" : "test",*
>>>> *  "description" : "Measure to check %age of id field values are same",*
>>>> *  "deleted" : false,*
>>>> *  "timestamp" : 1599822000000,*
>>>> *  "dq.type" : "ACCURACY",*
>>>> *  "sinks" : [ "ELASTICSEARCH", "HDFS" ],*
>>>> *  "process.type" : "BATCH",*
>>>> *  "data.sources" : [ {*
>>>> *    "id" : 204,*
>>>> *    "name" : "source",*
>>>> *    "connectors" : [ {*
>>>> *      "id" : 205,*
>>>> *      "name" : "source1599568886803",*
>>>> *      "type" : "HIVE",*
>>>> *      "version" : "1.2",*
>>>> *      "predicates" : [ ],*
>>>> *      "data.unit" : "1hour",*
>>>> *      "data.time.zone" : "",*
>>>> *      "config" : {*
>>>> *        "database" : "default",*
>>>> *        "table.name <http://table.name>" : "demo_src",*
>>>> *        "where" : "dt=20200911 AND hour=11"*
>>>> *      }*
>>>> *    } ],*
>>>> *    "baseline" : false*
>>>> *  }, {*
>>>> *    "id" : 206,*
>>>> *    "name" : "target",*
>>>> *    "connectors" : [ {*
>>>> *      "id" : 207,*
>>>> *      "name" : "target1599568896874",*
>>>> *      "type" : "HIVE",*
>>>> *      "version" : "1.2",*
>>>> *      "predicates" : [ ],*
>>>> *      "data.unit" : "1hour",*
>>>> *      "data.time.zone" : "",*
>>>> *      "config" : {*
>>>> *        "database" : "default",*
>>>> *        "table.name <http://table.name>" : "demo_tgt",*
>>>> *        "where" : "dt=20200911 AND hour=11"*
>>>> *      }*
>>>> *    } ],*
>>>> *    "baseline" : false*
>>>> *  } ],*
>>>> *  "evaluate.rule" : {*
>>>> *    "id" : 202,*
>>>> *    "rules" : [ {*
>>>> *      "id" : 203,*
>>>> *      "rule" : "source.id <http://source.id>=target.id
>>>> <http://target.id>",*
>>>> *      "dsl.type" : "griffin-dsl",*
>>>> *      "dq.type" : "ACCURACY",*
>>>> *      "out.dataframe.name <http://out.dataframe.name>" : "accuracy"*
>>>> *    } ]*
>>>> *  },*
>>>> *  "measure.type" : "griffin"*
>>>> *}*
>>>> *2020-09-11 12:00:00.119 ERROR 10980 --- [ryBean_Worker-3]
>>>> o.a.g.c.j.SparkSubmitJob                 : Post to livy ERROR. I/O error on
>>>> POST request for "http://localhost:8998/batches
>>>> <http://localhost:8998/batches>": Connection refused (Connection refused);
>>>> nested exception is java.net.ConnectException: Connection refused
>>>> (Connection refused)*
>>>> *2020-09-11 12:00:00.131  INFO 10980 --- [ryBean_Worker-3]
>>>> o.a.g.c.j.SparkSubmitJob                 : Delete predicate
>>>> job(PG,Test_Job_predicate_1599825600016) SUCCESS.*
>>>> *[EL Fine]: sql: 2020-09-11
>>>> 12:00:00.133--ClientSession(273634815)--Connection(296858203)--UPDATE
>>>> JOBINSTANCEBEAN SET predicate_job_deleted = ?, STATE = ? WHERE (ID = ?)*
>>>> *        bind => [3 parameters bound]*
>>>> *[EL Fine]: sql: 2020-09-11
>>>> 12:00:11.664--ServerSession(400064818)--Connection(1735064739)--SELECT
>>>> DISTINCT ID, APPID, APPURI, CREATEDDATE, DELETED, expire_timestamp,
>>>> MODIFIEDDATE, predicate_job_deleted, predicate_group_name,
>>>> predicate_job_name, SESSIONID, STATE, timestamp, TYPE, job_id FROM
>>>> JOBINSTANCEBEAN WHERE (STATE IN (?,?,?,?,?,?))*
>>>> *        bind => [6 parameters bound]*
>>>>
>>>>
>>>> Thanks and Regards,
>>>> Sunil Muniyal
>>>>
>>>>
>>>> On Fri, Sep 11, 2020 at 3:42 PM William Guo <[email protected]> wrote:
>>>>
>>>>> From the log, I didn't find any information related to metrics
>>>>> fetching.
>>>>>
>>>>> Could you try to call /api/v1/metrics, and show us the latest log
>>>>> again?
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Sep 11, 2020 at 5:48 PM Sunil Muniyal <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> 1: I guest it is related to your login user and super user.
>>>>>> I am less worried about unless this could be the cause of metrics not
>>>>>> being displayed.
>>>>>>
>>>>>> 2: Could you share with us your griffin log , I suspect some
>>>>>> exception happened when trying to connect with ES.
>>>>>> Attached is the service.out file. I see an error is while submitting
>>>>>> Spark jobs via Livy. Since Livy is not configured / deployed this is
>>>>>> expected. I believe this should not be the reason since we are getting 
>>>>>> data
>>>>>> from hive (as part of batch processing). Please correct if my 
>>>>>> understanding
>>>>>> is incorrect.
>>>>>>
>>>>>> Thanks and Regards,
>>>>>> Sunil Muniyal
>>>>>>
>>>>>>
>>>>>> On Fri, Sep 11, 2020 at 3:09 PM William Guo <[email protected]> wrote:
>>>>>>
>>>>>>> 1: I guest it is related to your login user and super user.
>>>>>>> 2: Could you share with us your griffin log , I suspect some
>>>>>>> exception happened when try to connect with ES.
>>>>>>>
>>>>>>> On Fri, Sep 11, 2020 at 5:14 PM Sunil Muniyal <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Hello William,
>>>>>>>>
>>>>>>>> Tried as suggested.
>>>>>>>>
>>>>>>>> 1. Ingested data into Hive tables using the provided script.
>>>>>>>> The ownership still show as is (Source with Admin and Target with
>>>>>>>> Root)
>>>>>>>>
>>>>>>>> 2. Updated env-batch.json and env-streaming.json files with IP
>>>>>>>> address for ES and rebuilt Griffin.
>>>>>>>> Still no metrics for the jobs executed.
>>>>>>>> ES does have data as confirmed yesterday.
>>>>>>>>
>>>>>>>> Please help.
>>>>>>>>
>>>>>>>> Thanks and Regards,
>>>>>>>> Sunil Muniyal
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Sep 10, 2020 at 7:41 PM William Guo <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> please enter ip directly.
>>>>>>>>> not sure whether hostname can be resolved correctly or not.
>>>>>>>>>
>>>>>>>>> On Thu, Sep 10, 2020 at 10:06 PM Sunil Muniyal <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hi William,
>>>>>>>>>>
>>>>>>>>>> Thank you for the reply.
>>>>>>>>>>
>>>>>>>>>> Regarding points 2 and 3. Possible to share some more details. I
>>>>>>>>>> believe the env_batch.json is configured as it is expected. What 
>>>>>>>>>> exactly
>>>>>>>>>> needs to be updated correctly? ES Hostname or shall I enter IP or 
>>>>>>>>>> something
>>>>>>>>>> else? Please help.
>>>>>>>>>>
>>>>>>>>>> Thanks and Regards,
>>>>>>>>>> Sunil Muniyal
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Sep 10, 2020 at 7:30 PM William Guo <[email protected]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> 1 OK, We will fix this issue soon.
>>>>>>>>>>> 2 Could you try ping es from your spark environment and input ES
>>>>>>>>>>> endpoint correctly in env_batch.json
>>>>>>>>>>> 3 Please put your es endpoint in env_batch.json
>>>>>>>>>>> 6 Please try the following script to build your env.
>>>>>>>>>>> ```
>>>>>>>>>>>
>>>>>>>>>>> #!/bin/bash
>>>>>>>>>>> #create table
>>>>>>>>>>> hive -f create-table.hqlecho "create table done"
>>>>>>>>>>> #current hoursudo ./gen_demo_data.shcur_date=`date 
>>>>>>>>>>> +%Y%m%d%H`dt=${cur_date:0:8}hour=${cur_date:8:2}partition_date="dt='$dt',hour='$hour'"
>>>>>>>>>>> sed s/PARTITION_DATE/$partition_date/ ./insert-data.hql.template > 
>>>>>>>>>>> insert-data.hql
>>>>>>>>>>> hive -f 
>>>>>>>>>>> insert-data.hqlsrc_done_path=/griffin/data/batch/demo_src/dt=${dt}/hour=${hour}/_DONEtgt_done_path=/griffin/data/batch/demo_tgt/dt=${dt}/hour=${hour}/_DONE
>>>>>>>>>>> hadoop fs -mkdir -p 
>>>>>>>>>>> /griffin/data/batch/demo_src/dt=${dt}/hour=${hour}
>>>>>>>>>>> hadoop fs -mkdir -p 
>>>>>>>>>>> /griffin/data/batch/demo_tgt/dt=${dt}/hour=${hour}
>>>>>>>>>>> hadoop fs -touchz ${src_done_path}
>>>>>>>>>>> hadoop fs -touchz ${tgt_done_path}echo "insert data 
>>>>>>>>>>> [$partition_date] done"
>>>>>>>>>>> #last hoursudo ./gen_demo_data.shcur_date=`date -d '1 hour ago' 
>>>>>>>>>>> +%Y%m%d%H`dt=${cur_date:0:8}hour=${cur_date:8:2}partition_date="dt='$dt',hour='$hour'"
>>>>>>>>>>> sed s/PARTITION_DATE/$partition_date/ ./insert-data.hql.template > 
>>>>>>>>>>> insert-data.hql
>>>>>>>>>>> hive -f 
>>>>>>>>>>> insert-data.hqlsrc_done_path=/griffin/data/batch/demo_src/dt=${dt}/hour=${hour}/_DONEtgt_done_path=/griffin/data/batch/demo_tgt/dt=${dt}/hour=${hour}/_DONE
>>>>>>>>>>> hadoop fs -mkdir -p 
>>>>>>>>>>> /griffin/data/batch/demo_src/dt=${dt}/hour=${hour}
>>>>>>>>>>> hadoop fs -mkdir -p 
>>>>>>>>>>> /griffin/data/batch/demo_tgt/dt=${dt}/hour=${hour}
>>>>>>>>>>> hadoop fs -touchz ${src_done_path}
>>>>>>>>>>> hadoop fs -touchz ${tgt_done_path}echo "insert data 
>>>>>>>>>>> [$partition_date] done"
>>>>>>>>>>> #next hoursset +ewhile truedo
>>>>>>>>>>>   sudo ./gen_demo_data.sh
>>>>>>>>>>>   cur_date=`date +%Y%m%d%H`
>>>>>>>>>>>   next_date=`date -d "+1hour" '+%Y%m%d%H'`
>>>>>>>>>>>   dt=${next_date:0:8}
>>>>>>>>>>>   hour=${next_date:8:2}
>>>>>>>>>>>   partition_date="dt='$dt',hour='$hour'"
>>>>>>>>>>>   sed s/PARTITION_DATE/$partition_date/ ./insert-data.hql.template 
>>>>>>>>>>> > insert-data.hql
>>>>>>>>>>>   hive -f insert-data.hql
>>>>>>>>>>>   
>>>>>>>>>>> src_done_path=/griffin/data/batch/demo_src/dt=${dt}/hour=${hour}/_DONE
>>>>>>>>>>>   
>>>>>>>>>>> tgt_done_path=/griffin/data/batch/demo_tgt/dt=${dt}/hour=${hour}/_DONE
>>>>>>>>>>>   hadoop fs -mkdir -p 
>>>>>>>>>>> /griffin/data/batch/demo_src/dt=${dt}/hour=${hour}
>>>>>>>>>>>   hadoop fs -mkdir -p 
>>>>>>>>>>> /griffin/data/batch/demo_tgt/dt=${dt}/hour=${hour}
>>>>>>>>>>>   hadoop fs -touchz ${src_done_path}
>>>>>>>>>>>   hadoop fs -touchz ${tgt_done_path}
>>>>>>>>>>>   echo "insert data [$partition_date] done"
>>>>>>>>>>>   sleep 3600doneset -e
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> William
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Sep 10, 2020 at 4:58 PM Sunil Muniyal <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> 1. Since I was able to get ElasticSearch 6.8.x integrated, does
>>>>>>>>>>>> it mean that only ES upto 6.8.x is supported for Griffin as of 
>>>>>>>>>>>> now? If yes,
>>>>>>>>>>>> what are the plans further? Is there a page from which I could get 
>>>>>>>>>>>> updates?
>>>>>>>>>>>> --please file a jira ticket for us to make our code ES
>>>>>>>>>>>> compatible.
>>>>>>>>>>>> [SM] GRIFFIN-346 - Support for Elastic Search latest version
>>>>>>>>>>>> (7.9.1) <https://issues.apache.org/jira/browse/GRIFFIN-346> is
>>>>>>>>>>>> submitted
>>>>>>>>>>>>
>>>>>>>>>>>> 2. I still do not see the metrics available (please refer below
>>>>>>>>>>>> screenshots). Though the measure is now listed in the drop down of 
>>>>>>>>>>>> *DQ
>>>>>>>>>>>> Metrics* tab. But when I selected the test measure, nothing
>>>>>>>>>>>> came up.
>>>>>>>>>>>> --could you check the ES whether metrics have been injected or
>>>>>>>>>>>> not.
>>>>>>>>>>>> [SM] I used the link below and got the index that is created in
>>>>>>>>>>>> ES. I believe the data is loaded. However, please correct if I
>>>>>>>>>>>> understood incorrectly
>>>>>>>>>>>> *"http://<ES Public IP>:9200/_cat/indices?v"*
>>>>>>>>>>>> --------------> POC env is on public cloud so using Public IP.
>>>>>>>>>>>>
>>>>>>>>>>>> health status index   uuid                   pri rep docs.count 
>>>>>>>>>>>> docs.deleted store.size pri.store.size
>>>>>>>>>>>> yellow open   griffin ur_Kd3XFQBCsPzIM84j87Q   5   2          0    
>>>>>>>>>>>>         0      1.2kb          1.2kb
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Docs in the index:* "http://<ES Public
>>>>>>>>>>>> IP>:9200/griffin/_search"*
>>>>>>>>>>>>
>>>>>>>>>>>> {"took":44,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Index Mapping: *"http://<ES Public IP>:9200/griffin"*
>>>>>>>>>>>>
>>>>>>>>>>>> {"griffin":{"aliases":{},"mappings":{"accuracy":{"properties":{"name":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"tmst":{"type":"date"}}}},"settings":{"index":{"creation_date":"1599567930578","number_of_shards":"5","number_of_replicas":"2","uuid":"ur_Kd3XFQBCsPzIM84j87Q","version":{"created":"6081299"},"provided_name":"griffin"}}}}
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> 3. At a step in deployment guide it is suggested to check URL: 
>>>>>>>>>>>> "*http://<ES
>>>>>>>>>>>> HOST IP>:9200/griffin/accuracy"* When navigated to this URL, I
>>>>>>>>>>>> get below error. Please advise
>>>>>>>>>>>> *{"error":"Incorrect HTTP method for uri [/griffin/accuracy]
>>>>>>>>>>>> and method [GET], allowed: [POST]","status":405}*
>>>>>>>>>>>> *-- it seems you need to use POST method.*
>>>>>>>>>>>> [SM] I am using the POST method as suggested in the article.
>>>>>>>>>>>> Below is the JSON of *env_batch.JSON*
>>>>>>>>>>>> *    {*
>>>>>>>>>>>> *      "type": "ELASTICSEARCH",*
>>>>>>>>>>>> *      "config": {*
>>>>>>>>>>>> *        "method": "post",*
>>>>>>>>>>>> *        "api": "http://<ES Host Name>:9200/griffin/accuracy",
>>>>>>>>>>>> ---------> *do we need IP here?
>>>>>>>>>>>> *        "connection.timeout": "1m",*
>>>>>>>>>>>> *        "retry": 10*
>>>>>>>>>>>> *      }*
>>>>>>>>>>>> *    }*
>>>>>>>>>>>>
>>>>>>>>>>>> 6. I also noticed that in Data Assets, *demo_src* is owned by
>>>>>>>>>>>> Admin whereas, *demo-tgt* by root. Would that make any
>>>>>>>>>>>> difference? If yes, how to correct it? Reload HIVE data?
>>>>>>>>>>>> -- could you show me your script for dataset setup?
>>>>>>>>>>>> <https://issues.apache.org/jira/browse/GRIFFIN-346>
>>>>>>>>>>>> [SM] Attached are the 3 scripts. gen-hive-data.sh is the master
>>>>>>>>>>>> script which triggers demo_data and it further triggers delta_src.
>>>>>>>>>>>> Have done it as it is instructed in the Github article and
>>>>>>>>>>>> gen-hive-data.sh is triggered as root in the terminal.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Please advise.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>> Sunil Muniyal
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Sep 9, 2020 at 8:41 PM William Guo <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> *Request you to please advise further on below points:*
>>>>>>>>>>>>> 1. Since I was able to get ElasticSearch 6.8.x integrated,
>>>>>>>>>>>>> does it mean that only ES upto 6.8.x is supported for Griffin as 
>>>>>>>>>>>>> of now? If
>>>>>>>>>>>>> yes, what are the plans further? Is there a page from which I 
>>>>>>>>>>>>> could get
>>>>>>>>>>>>> updates?
>>>>>>>>>>>>> --please file a jira ticket for us to make our code ES
>>>>>>>>>>>>> compatible.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 2. I still do not see the metrics available (please refer
>>>>>>>>>>>>> below screenshots). Though the measure is now listed in the drop 
>>>>>>>>>>>>> down of *DQ
>>>>>>>>>>>>> Metrics* tab. But when I selected the test measure, nothing
>>>>>>>>>>>>> came up.
>>>>>>>>>>>>> --could you check the ES whether metrics have been injected or
>>>>>>>>>>>>> not.
>>>>>>>>>>>>>
>>>>>>>>>>>>> 3. At a step in deployment guide it is suggested to check URL: 
>>>>>>>>>>>>> http://<ES
>>>>>>>>>>>>> HOST IP>:9200/griffin/accuracy
>>>>>>>>>>>>> <http://13.126.127.141:9200/griffin/accuracy> When navigated
>>>>>>>>>>>>> to this URL, I get below error. Please advise
>>>>>>>>>>>>> *{"error":"Incorrect HTTP method for uri [/griffin/accuracy]
>>>>>>>>>>>>> and method [GET], allowed: [POST]","status":405}*
>>>>>>>>>>>>> *-- it seems you need to use POST method.*
>>>>>>>>>>>>>
>>>>>>>>>>>>> 6. I also noticed that in Data Assets, *demo_src* is owned by
>>>>>>>>>>>>> Admin whereas, *demo-tgt* by root. Would that make any
>>>>>>>>>>>>> difference? If yes, how to correct it? Reload HIVE data?
>>>>>>>>>>>>>
>>>>>>>>>>>>> -- could you show me your script for dataset setup?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Sep 8, 2020 at 9:02 PM Sunil Muniyal <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi William,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I was finally able to get Griffin up and ElasticSearch
>>>>>>>>>>>>>> integrated along with Hadoop. Thanks a lot for your help and 
>>>>>>>>>>>>>> guidance so
>>>>>>>>>>>>>> far.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have created a test measure and a job which gets triggered
>>>>>>>>>>>>>> at every 4 mins automatically (have referred to the user guide 
>>>>>>>>>>>>>> available on
>>>>>>>>>>>>>> GitHub at this link
>>>>>>>>>>>>>> <https://github.com/apache/griffin/blob/master/griffin-doc/ui/user-guide.md>
>>>>>>>>>>>>>> .)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Request you to please advise further on below points:*
>>>>>>>>>>>>>> 1. Since I was able to get ElasticSearch 6.8.x integrated,
>>>>>>>>>>>>>> does it mean that only ES upto 6.8.x is supported for Griffin as 
>>>>>>>>>>>>>> of now? If
>>>>>>>>>>>>>> yes, what are the plans further? Is there a page from which I 
>>>>>>>>>>>>>> could get
>>>>>>>>>>>>>> updates?
>>>>>>>>>>>>>> 2. I still do not see the metrics available (please refer
>>>>>>>>>>>>>> below screenshots). Though the measure is now listed in the drop 
>>>>>>>>>>>>>> down of *DQ
>>>>>>>>>>>>>> Metrics* tab. But when I selected the test measure, nothing
>>>>>>>>>>>>>> came up.
>>>>>>>>>>>>>> 3. At a step in deployment guide it is suggested to check
>>>>>>>>>>>>>> URL: http://<ES HOST IP>:9200/griffin/accuracy
>>>>>>>>>>>>>> <http://13.126.127.141:9200/griffin/accuracy> When navigated
>>>>>>>>>>>>>> to this URL, I get below error. Please advise
>>>>>>>>>>>>>> *{"error":"Incorrect HTTP method for uri [/griffin/accuracy]
>>>>>>>>>>>>>> and method [GET], allowed: [POST]","status":405}*
>>>>>>>>>>>>>> 6. I also noticed that in Data Assets, *demo_src* is owned
>>>>>>>>>>>>>> by Admin whereas, *demo-tgt* by root. Would that make any
>>>>>>>>>>>>>> difference? If yes, how to correct it? Reload HIVE data?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Screenshots:*
>>>>>>>>>>>>>> *Data Assets:*
>>>>>>>>>>>>>> [image: image.png]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *DQ Metrics (Test Measure selected):*
>>>>>>>>>>>>>> [image: image.png]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Job Triggered multiple times:*
>>>>>>>>>>>>>> [image: image.png]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *Metrics page from job directly:*
>>>>>>>>>>>>>> [image: image.png]
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>>>> Sunil Muniyal
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Sep 8, 2020 at 4:38 PM Sunil Muniyal <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am unable to get repos for 6.4.1 instead I found 6.8.x.
>>>>>>>>>>>>>>> Will try with this version of Elastic Search in sometime.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In the meantime, would it be possible to confirm if 6.4.x or
>>>>>>>>>>>>>>> 6.8.x is the only supported version for Griffin? Reason I am 
>>>>>>>>>>>>>>> asking is, the
>>>>>>>>>>>>>>> GitHub article for griffin deployment points to the latest 
>>>>>>>>>>>>>>> version of ES.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>>>>> Sunil Muniyal
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Sep 8, 2020 at 4:06 PM Sunil Muniyal <
>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I will need to redeploy ElasticSearch, correct?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>>>>>> Sunil Muniyal
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Sep 8, 2020 at 4:05 PM William Guo <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Could you try with this version?
>>>>>>>>>>>>>>>>> <elasticsearch.version>6.4.1</elasticsearch.version>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> William
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Sep 8, 2020 at 5:59 PM Sunil Muniyal <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi William / Dev group,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I have deployed ES 7.9 - latest version (single node) and
>>>>>>>>>>>>>>>>>> the same is configured. I also get the default page when 
>>>>>>>>>>>>>>>>>> hitting
>>>>>>>>>>>>>>>>>> http://<ES HOST IP>:9200/
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Upon creating the griffin configurations using the JSON
>>>>>>>>>>>>>>>>>> string given
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> curl -k -H "Content-Type: application/json" -X PUT 
>>>>>>>>>>>>>>>>>> http://<replaced with my ES host IP>:9200/griffin \
>>>>>>>>>>>>>>>>>>  -d '{
>>>>>>>>>>>>>>>>>>     "aliases": {},
>>>>>>>>>>>>>>>>>>     "mappings": {
>>>>>>>>>>>>>>>>>>         "accuracy": {
>>>>>>>>>>>>>>>>>>             "properties": {
>>>>>>>>>>>>>>>>>>                 "name": {
>>>>>>>>>>>>>>>>>>                     "fields": {
>>>>>>>>>>>>>>>>>>                         "keyword": {
>>>>>>>>>>>>>>>>>>                             "ignore_above": 256,
>>>>>>>>>>>>>>>>>>                             "type": "keyword"
>>>>>>>>>>>>>>>>>>                         }
>>>>>>>>>>>>>>>>>>                     },
>>>>>>>>>>>>>>>>>>                     "type": "text"
>>>>>>>>>>>>>>>>>>                 },
>>>>>>>>>>>>>>>>>>                 "tmst": {
>>>>>>>>>>>>>>>>>>                     "type": "date"
>>>>>>>>>>>>>>>>>>                 }
>>>>>>>>>>>>>>>>>>             }
>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>>     "settings": {
>>>>>>>>>>>>>>>>>>         "index": {
>>>>>>>>>>>>>>>>>>             "number_of_replicas": "2",
>>>>>>>>>>>>>>>>>>             "number_of_shards": "5"
>>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>> }'
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> *I get below error:*
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> *{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"Root
>>>>>>>>>>>>>>>>>> mapping definition has unsupported parameters:  [accuracy :
>>>>>>>>>>>>>>>>>> {properties={name={fields={keyword={ignore_above=256, 
>>>>>>>>>>>>>>>>>> type=keyword}},
>>>>>>>>>>>>>>>>>> type=text},
>>>>>>>>>>>>>>>>>> tmst={type=date}}}]"}],"type":"mapper_parsing_exception","reason":"Failed
>>>>>>>>>>>>>>>>>> to parse mapping [_doc]: Root mapping definition has 
>>>>>>>>>>>>>>>>>> unsupported
>>>>>>>>>>>>>>>>>> parameters:  [accuracy :
>>>>>>>>>>>>>>>>>> {properties={name={fields={keyword={ignore_above=256, 
>>>>>>>>>>>>>>>>>> type=keyword}},
>>>>>>>>>>>>>>>>>> type=text},
>>>>>>>>>>>>>>>>>> tmst={type=date}}}]","caused_by":{"type":"mapper_parsing_exception","reason":"Root
>>>>>>>>>>>>>>>>>> mapping definition has unsupported parameters:  [accuracy :
>>>>>>>>>>>>>>>>>> {properties={name={fields={keyword={ignore_above=256, 
>>>>>>>>>>>>>>>>>> type=keyword}},
>>>>>>>>>>>>>>>>>> type=text}, tmst={type=date}}}]"}},"status":400}*
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Seems like the JSON string is missing some values or is
>>>>>>>>>>>>>>>>>> incorrectly provided.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Would be great if you could please help.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>>>>>>>> Sunil Muniyal
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Mon, Sep 7, 2020 at 8:16 PM Sunil Muniyal <
>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thank you for the response, William.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I have started preparing for ES deployment and should
>>>>>>>>>>>>>>>>>>> attempt the same tomorrow.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> In the meantime, I will also wait for the Dev team in
>>>>>>>>>>>>>>>>>>> case they have any additional inputs.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>>>>>>>>> Sunil Muniyal
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Mon, Sep 7, 2020 at 8:06 PM William Guo <
>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> If dev confirms it to be mandatory, as I understand
>>>>>>>>>>>>>>>>>>>> correct, I will need to:
>>>>>>>>>>>>>>>>>>>> 1. Deploy and Configure ES
>>>>>>>>>>>>>>>>>>>> 2. Update application.properties to include ES details
>>>>>>>>>>>>>>>>>>>> and create ES index
>>>>>>>>>>>>>>>>>>>> 3. Rebuild Maven package and rerun the Griffin service
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> *Right, you need to package es env configuration into
>>>>>>>>>>>>>>>>>>>> your jar.*
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> There is no need to reload the data into Hadoop (Hive),
>>>>>>>>>>>>>>>>>>>> correct?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> *No*
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On a side note, is there any other documentation of
>>>>>>>>>>>>>>>>>>>> Griffin available or underway which would help to get 
>>>>>>>>>>>>>>>>>>>> below details while
>>>>>>>>>>>>>>>>>>>> integrating it with Cloudera Hadoop?
>>>>>>>>>>>>>>>>>>>> 1. What are the exact ports requirements (internal and
>>>>>>>>>>>>>>>>>>>> external)?
>>>>>>>>>>>>>>>>>>>> *check log and make sure all extra connections in
>>>>>>>>>>>>>>>>>>>> properties can accessible*
>>>>>>>>>>>>>>>>>>>> 2. Which all packages will be required?
>>>>>>>>>>>>>>>>>>>> *no*
>>>>>>>>>>>>>>>>>>>> 3. Any Java dependencies?
>>>>>>>>>>>>>>>>>>>> *java 1.8*
>>>>>>>>>>>>>>>>>>>> 4. If we have Cloudera Hadoop cluster kerberized
>>>>>>>>>>>>>>>>>>>> (secured), what are the dependencies or additional 
>>>>>>>>>>>>>>>>>>>> configurations needed?
>>>>>>>>>>>>>>>>>>>> *Should no extra dependencies, except those transitive
>>>>>>>>>>>>>>>>>>>> dependencies incurred by spark and hadoop.*
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Mon, Sep 7, 2020 at 6:42 PM Sunil Muniyal <
>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Ohh ok.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> If dev confirms it to be mandatory, as I understand
>>>>>>>>>>>>>>>>>>>>> correct, I will need to:
>>>>>>>>>>>>>>>>>>>>> 1. Deploy and Configure ES
>>>>>>>>>>>>>>>>>>>>> 2. Update application.properties to include ES details
>>>>>>>>>>>>>>>>>>>>> and create ES index
>>>>>>>>>>>>>>>>>>>>> 3. Rebuild Maven package and rerun the Griffin service
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> There is no need to reload the data into Hadoop
>>>>>>>>>>>>>>>>>>>>> (Hive), correct?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On a side note, is there any other documentation of
>>>>>>>>>>>>>>>>>>>>> Griffin available or underway which would help to get 
>>>>>>>>>>>>>>>>>>>>> below details while
>>>>>>>>>>>>>>>>>>>>> integrating it with Cloudera Hadoop?
>>>>>>>>>>>>>>>>>>>>> 1. What are the exact ports requirements (internal and
>>>>>>>>>>>>>>>>>>>>> external)?
>>>>>>>>>>>>>>>>>>>>> 2. Which all packages will be required?
>>>>>>>>>>>>>>>>>>>>> 3. Any Java dependencies?
>>>>>>>>>>>>>>>>>>>>> 4. If we have Cloudera Hadoop cluster kerberized
>>>>>>>>>>>>>>>>>>>>> (secured), what are the dependencies or additional 
>>>>>>>>>>>>>>>>>>>>> configurations needed?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I know some of the above information can be fetched
>>>>>>>>>>>>>>>>>>>>> from the deployment guide on Github. However, checking if 
>>>>>>>>>>>>>>>>>>>>> any other formal
>>>>>>>>>>>>>>>>>>>>> documentation has been made available for the same?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>>>>>>>>>>> Sunil Muniyal
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 7, 2020 at 4:05 PM William Guo <
>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> cc dev for double checking.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Measure will emit metrics and store them in elastic,
>>>>>>>>>>>>>>>>>>>>>> UI fetch those metrics from elastic.
>>>>>>>>>>>>>>>>>>>>>> So elastic should be mandatory.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>> William
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 7, 2020 at 6:32 PM Sunil Muniyal <
>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thank you for the quick response, William.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I have not configured ElasticSearch since it is not
>>>>>>>>>>>>>>>>>>>>>>> deployed.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> In the application.properties, I just added the
>>>>>>>>>>>>>>>>>>>>>>> dummy information (as below) just to pass the 
>>>>>>>>>>>>>>>>>>>>>>> validation test and get
>>>>>>>>>>>>>>>>>>>>>>> Griffin up and running.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> # elasticsearch
>>>>>>>>>>>>>>>>>>>>>>> # elasticsearch.host = <IP>
>>>>>>>>>>>>>>>>>>>>>>> # elasticsearch.port = <elasticsearch rest port>
>>>>>>>>>>>>>>>>>>>>>>> # elasticsearch.user = user
>>>>>>>>>>>>>>>>>>>>>>> # elasticsearch.password = password
>>>>>>>>>>>>>>>>>>>>>>> elasticsearch.host=localhost
>>>>>>>>>>>>>>>>>>>>>>> elasticsearch.port=9200
>>>>>>>>>>>>>>>>>>>>>>> elasticsearch.scheme=http
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Is ElasticSearch a mandatory requirement to use
>>>>>>>>>>>>>>>>>>>>>>> Griffin?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>>>>>>>>>>>>> Sunil Muniyal
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 7, 2020 at 3:58 PM William Guo <
>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Could you check whether ES has been injected with
>>>>>>>>>>>>>>>>>>>>>>>> those metrics or not?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 7, 2020 at 6:23 PM Sunil Muniyal <
>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Hello William,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> I was able to bypass this error by entering the
>>>>>>>>>>>>>>>>>>>>>>>>> default field values for LDAP, ElasticSearch and Livy 
>>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>>> application.properties and successfully get Griffin 
>>>>>>>>>>>>>>>>>>>>>>>>> running.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> By following the below article, I have created a
>>>>>>>>>>>>>>>>>>>>>>>>> test measure and then a job which triggers that 
>>>>>>>>>>>>>>>>>>>>>>>>> measure.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/griffin/blob/master/griffin-doc/ui/user-guide.md
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Have allowed the job to get triggered multiple
>>>>>>>>>>>>>>>>>>>>>>>>> times, however, still i can't see anything in metrics 
>>>>>>>>>>>>>>>>>>>>>>>>> related to the job.
>>>>>>>>>>>>>>>>>>>>>>>>> Neither I see anything in *health *or
>>>>>>>>>>>>>>>>>>>>>>>>> *mydashboard* tabs. Also, if you notice in the
>>>>>>>>>>>>>>>>>>>>>>>>> screenshot below, being in the *DQ Metrics* tab,
>>>>>>>>>>>>>>>>>>>>>>>>> I still do not see the created measure in the drop 
>>>>>>>>>>>>>>>>>>>>>>>>> down list.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> [image: image.png]
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> *Test job executed multiple times:*
>>>>>>>>>>>>>>>>>>>>>>>>> [image: image.png]
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Please advise if anything is mis-configured.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>>>>>>>>>>>>>>> Sunil Muniyal
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 7, 2020 at 12:40 PM Sunil Muniyal <
>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Hello William,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you for the reply.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> This helped, actually i had missed to add the
>>>>>>>>>>>>>>>>>>>>>>>>>> property in application.properties.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Now the other challenge is, along with ES and
>>>>>>>>>>>>>>>>>>>>>>>>>> Livy, I am also not using LDAP and it is hitting the 
>>>>>>>>>>>>>>>>>>>>>>>>>> error *unable
>>>>>>>>>>>>>>>>>>>>>>>>>> to resolve ldap.url property.* Of Course it
>>>>>>>>>>>>>>>>>>>>>>>>>> will, since the property is not configured.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Please suggest.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>>>>>>>>>>>>>>>> Sunil Muniyal
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Sun, Sep 6, 2020 at 7:26 PM William Guo <
>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> hi Sunil Muniyal,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Could you check this property in your griffin
>>>>>>>>>>>>>>>>>>>>>>>>>>> properties file?
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> internal.event.listeners
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> William
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 3, 2020 at 11:05 PM Sunil Muniyal <
>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I am attempting to integrate Griffin with
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cloudera Hadoop by following below article:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/griffin/blob/master/griffin-doc/deploy/deploy-guide.md
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> <https://github.com/apache/griffin/blob/master/griffin-doc/deploy/deploy-guide.md>I
>>>>>>>>>>>>>>>>>>>>>>>>>>>> have followed everything as instructed, apart from 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> below things:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. Using Cloudera Hadoop 5.15 and relevant
>>>>>>>>>>>>>>>>>>>>>>>>>>>> configurations instead of Apache Hadoop
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Not using Elastic search as it is not
>>>>>>>>>>>>>>>>>>>>>>>>>>>> applicable
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. Did not use Livy as it is not applicable.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Maven build is successful and has got 2 jars at
>>>>>>>>>>>>>>>>>>>>>>>>>>>> service/target and measure/target which I have 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> uploaded to HDFS.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> However, *starting griffin-service.jar using
>>>>>>>>>>>>>>>>>>>>>>>>>>>> nohup command* is failing with below error:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> *Caused by: java.lang.IllegalArgumentException:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Could not resolve placeholder 
>>>>>>>>>>>>>>>>>>>>>>>>>>>> 'internal.event.listeners' in string value
>>>>>>>>>>>>>>>>>>>>>>>>>>>> "#{'${internal.event.listeners}'.split(',')}"*
>>>>>>>>>>>>>>>>>>>>>>>>>>>> *        at
>>>>>>>>>>>>>>>>>>>>>>>>>>>> org.springframework.util.PropertyPlaceholderHelper.parseStringValue(PropertyPlaceholderHelper.java:174)
>>>>>>>>>>>>>>>>>>>>>>>>>>>> ~[spring-core-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]*
>>>>>>>>>>>>>>>>>>>>>>>>>>>> *        at
>>>>>>>>>>>>>>>>>>>>>>>>>>>> org.springframework.util.PropertyPlaceholderHelper.replacePlaceholders(PropertyPlaceholderHelper.java:126)
>>>>>>>>>>>>>>>>>>>>>>>>>>>> ~[spring-core-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]*
>>>>>>>>>>>>>>>>>>>>>>>>>>>> *        at
>>>>>>>>>>>>>>>>>>>>>>>>>>>> org.springframework.core.env.AbstractPropertyResolver.doResolvePlaceholders(AbstractPropertyResolver.java:236)
>>>>>>>>>>>>>>>>>>>>>>>>>>>> ~[spring-core-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]*
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> I have tried to search a lot of articles with
>>>>>>>>>>>>>>>>>>>>>>>>>>>> no luck.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Would be great if someone could help me to fix
>>>>>>>>>>>>>>>>>>>>>>>>>>>> this.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Also, attached is the output of nohup command
>>>>>>>>>>>>>>>>>>>>>>>>>>>> that was written in service.out.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Sunil Muniyal
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>>>>>>>>>>>>>>>>> To unsubscribe, e-mail:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>>>>>>>>> For additional commands, e-mail:
>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>>>>>>>> For additional commands, e-mail: [email protected]
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>> For additional commands, e-mail: [email protected]
>>>>>
>>>>>

Re: Griffin Integration with Cloudera Hadoop POC

Reply via email to