Re: Griffin Integration with Cloudera Hadoop POC

William Guo Fri, 11 Sep 2020 05:59:19 -0700

for griffin log, please search in your spark cluster env, usually in worker
log dir.
One weird thing is how you submit a job to spark, if you disabled livy?


On Fri, Sep 11, 2020 at 8:46 PM Sunil Muniyal <[email protected]>
wrote:

> possible to please help the location where measure log would get created
> or from where can i check the location?
>
> Thanks and Regards,
> Sunil Muniyal
>
>
> On Fri, Sep 11, 2020 at 6:14 PM William Guo <[email protected]> wrote:
>
>> Livy is used to post jobs to your cluster, I don't think it is related to
>> livy.
>>
>> Could you also share the measure log in your cluster?
>>
>>
>> On Fri, Sep 11, 2020 at 8:03 PM Sunil Muniyal <[email protected]>
>> wrote:
>>
>>> Got below message as output of
>>>
>>> {"Test_Measure":[{"name":"Test_Job","type":"ACCURACY","owner":"test","metricValues":[]}]}
>>>
>>> metricValues seems empty. So is it like Griffin is not getting data from
>>> ES? whereas ES does have the data which we verified previously. By any
>>> chance, do you think not having Livy could be a problem?
>>>
>>> These are the latest logs from service.out:
>>> *[EL Fine]: sql: 2020-09-11
>>> 11:59:11.662--ServerSession(400064818)--Connection(754936662)--SELECT
>>> DISTINCT ID, APPID, APPURI, CREATEDDATE, DELETED, expire_timestamp,
>>> MODIFIEDDATE, predicate_job_deleted, predicate_group_name,
>>> predicate_job_name, SESSIONID, STATE, timestamp, TYPE, job_id FROM
>>> JOBINSTANCEBEAN WHERE (STATE IN (?,?,?,?,?,?))*
>>> *        bind => [6 parameters bound]*
>>> *[EL Fine]: sql: 2020-09-11
>>> 11:59:51.044--ServerSession(400064818)--Connection(353930083)--SELECT ID,
>>> type, CREATEDDATE, CRONEXPRESSION, DELETED, quartz_group_name, JOBNAME,
>>> MEASUREID, METRICNAME, MODIFIEDDATE, quartz_job_name, PREDICATECONFIG,
>>> TIMEZONE FROM job WHERE (DELETED = ?)*
>>> *        bind => [1 parameter bound]*
>>> *[EL Fine]: sql: 2020-09-11
>>> 11:59:51.046--ServerSession(400064818)--Connection(1245663749)--SELECT
>>> DISTINCT DTYPE FROM MEASURE WHERE (DELETED = ?)*
>>> *        bind => [1 parameter bound]*
>>> *[EL Fine]: sql: 2020-09-11
>>> 11:59:51.046--ServerSession(400064818)--Connection(674248356)--SELECT
>>> t0.ID, t0.DTYPE, t0.CREATEDDATE, t0.DELETED, t0.DESCRIPTION, t0.DQTYPE,
>>> t0.MODIFIEDDATE, t0.NAME, t0.ORGANIZATION, t0.OWNER, t0.SINKS, t1.ID,
>>> t1.PROCESSTYPE, t1.RULEDESCRIPTION, t1.evaluate_rule_id FROM MEASURE t0,
>>> GRIFFINMEASURE t1 WHERE ((t0.DELETED = ?) AND ((t1.ID = t0.ID) AND
>>> (t0.DTYPE = ?)))*
>>> *        bind => [2 parameters bound]*
>>> *[EL Fine]: sql: 2020-09-11
>>> 12:00:00.019--ClientSession(294162678)--Connection(98503327)--INSERT INTO
>>> JOBINSTANCEBEAN (ID, APPID, APPURI, CREATEDDATE, DELETED, expire_timestamp,
>>> MODIFIEDDATE, predicate_job_deleted, predicate_group_name,
>>> predicate_job_name, SESSIONID, STATE, timestamp, TYPE, job_id) VALUES (?,
>>> ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)*
>>> *        bind => [15 parameters bound]*
>>> *[EL Fine]: sql: 2020-09-11
>>> 12:00:00.09--ServerSession(400064818)--Connection(491395630)--SELECT ID,
>>> APPID, APPURI, CREATEDDATE, DELETED, expire_timestamp, MODIFIEDDATE,
>>> predicate_job_deleted, predicate_group_name, predicate_job_name, SESSIONID,
>>> STATE, timestamp, TYPE, job_id FROM JOBINSTANCEBEAN WHERE
>>> (predicate_job_name = ?)*
>>> *        bind => [1 parameter bound]*
>>> *2020-09-11 12:00:00.117  INFO 10980 --- [ryBean_Worker-3]
>>> o.a.g.c.j.SparkSubmitJob                 : {*
>>> *  "measure.type" : "griffin",*
>>> *  "id" : 201,*
>>> *  "name" : "Test_Job",*
>>> *  "owner" : "test",*
>>> *  "description" : "Measure to check %age of id field values are same",*
>>> *  "deleted" : false,*
>>> *  "timestamp" : 1599822000000,*
>>> *  "dq.type" : "ACCURACY",*
>>> *  "sinks" : [ "ELASTICSEARCH", "HDFS" ],*
>>> *  "process.type" : "BATCH",*
>>> *  "data.sources" : [ {*
>>> *    "id" : 204,*
>>> *    "name" : "source",*
>>> *    "connectors" : [ {*
>>> *      "id" : 205,*
>>> *      "name" : "source1599568886803",*
>>> *      "type" : "HIVE",*
>>> *      "version" : "1.2",*
>>> *      "predicates" : [ ],*
>>> *      "data.unit" : "1hour",*
>>> *      "data.time.zone" : "",*
>>> *      "config" : {*
>>> *        "database" : "default",*
>>> *        "table.name <http://table.name>" : "demo_src",*
>>> *        "where" : "dt=20200911 AND hour=11"*
>>> *      }*
>>> *    } ],*
>>> *    "baseline" : false*
>>> *  }, {*
>>> *    "id" : 206,*
>>> *    "name" : "target",*
>>> *    "connectors" : [ {*
>>> *      "id" : 207,*
>>> *      "name" : "target1599568896874",*
>>> *      "type" : "HIVE",*
>>> *      "version" : "1.2",*
>>> *      "predicates" : [ ],*
>>> *      "data.unit" : "1hour",*
>>> *      "data.time.zone" : "",*
>>> *      "config" : {*
>>> *        "database" : "default",*
>>> *        "table.name <http://table.name>" : "demo_tgt",*
>>> *        "where" : "dt=20200911 AND hour=11"*
>>> *      }*
>>> *    } ],*
>>> *    "baseline" : false*
>>> *  } ],*
>>> *  "evaluate.rule" : {*
>>> *    "id" : 202,*
>>> *    "rules" : [ {*
>>> *      "id" : 203,*
>>> *      "rule" : "source.id <http://source.id>=target.id
>>> <http://target.id>",*
>>> *      "dsl.type" : "griffin-dsl",*
>>> *      "dq.type" : "ACCURACY",*
>>> *      "out.dataframe.name <http://out.dataframe.name>" : "accuracy"*
>>> *    } ]*
>>> *  },*
>>> *  "measure.type" : "griffin"*
>>> *}*
>>> *2020-09-11 12:00:00.119 ERROR 10980 --- [ryBean_Worker-3]
>>> o.a.g.c.j.SparkSubmitJob                 : Post to livy ERROR. I/O error on
>>> POST request for "http://localhost:8998/batches
>>> <http://localhost:8998/batches>": Connection refused (Connection refused);
>>> nested exception is java.net.ConnectException: Connection refused
>>> (Connection refused)*
>>> *2020-09-11 12:00:00.131  INFO 10980 --- [ryBean_Worker-3]
>>> o.a.g.c.j.SparkSubmitJob                 : Delete predicate
>>> job(PG,Test_Job_predicate_1599825600016) SUCCESS.*
>>> *[EL Fine]: sql: 2020-09-11
>>> 12:00:00.133--ClientSession(273634815)--Connection(296858203)--UPDATE
>>> JOBINSTANCEBEAN SET predicate_job_deleted = ?, STATE = ? WHERE (ID = ?)*
>>> *        bind => [3 parameters bound]*
>>> *[EL Fine]: sql: 2020-09-11
>>> 12:00:11.664--ServerSession(400064818)--Connection(1735064739)--SELECT
>>> DISTINCT ID, APPID, APPURI, CREATEDDATE, DELETED, expire_timestamp,
>>> MODIFIEDDATE, predicate_job_deleted, predicate_group_name,
>>> predicate_job_name, SESSIONID, STATE, timestamp, TYPE, job_id FROM
>>> JOBINSTANCEBEAN WHERE (STATE IN (?,?,?,?,?,?))*
>>> *        bind => [6 parameters bound]*
>>>
>>>
>>> Thanks and Regards,
>>> Sunil Muniyal
>>>
>>>
>>> On Fri, Sep 11, 2020 at 3:42 PM William Guo <[email protected]> wrote:
>>>
>>>> From the log, I didn't find any information related to metrics fetching.
>>>>
>>>> Could you try to call /api/v1/metrics, and show us the latest log again?
>>>>
>>>>
>>>>
>>>> On Fri, Sep 11, 2020 at 5:48 PM Sunil Muniyal <
>>>> [email protected]> wrote:
>>>>
>>>>> 1: I guest it is related to your login user and super user.
>>>>> I am less worried about unless this could be the cause of metrics not
>>>>> being displayed.
>>>>>
>>>>> 2: Could you share with us your griffin log , I suspect some exception
>>>>> happened when trying to connect with ES.
>>>>> Attached is the service.out file. I see an error is while submitting
>>>>> Spark jobs via Livy. Since Livy is not configured / deployed this is
>>>>> expected. I believe this should not be the reason since we are getting 
>>>>> data
>>>>> from hive (as part of batch processing). Please correct if my 
>>>>> understanding
>>>>> is incorrect.
>>>>>
>>>>> Thanks and Regards,
>>>>> Sunil Muniyal
>>>>>
>>>>>
>>>>> On Fri, Sep 11, 2020 at 3:09 PM William Guo <[email protected]> wrote:
>>>>>
>>>>>> 1: I guest it is related to your login user and super user.
>>>>>> 2: Could you share with us your griffin log , I suspect some
>>>>>> exception happened when try to connect with ES.
>>>>>>
>>>>>> On Fri, Sep 11, 2020 at 5:14 PM Sunil Muniyal <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Hello William,
>>>>>>>
>>>>>>> Tried as suggested.
>>>>>>>
>>>>>>> 1. Ingested data into Hive tables using the provided script.
>>>>>>> The ownership still show as is (Source with Admin and Target with
>>>>>>> Root)
>>>>>>>
>>>>>>> 2. Updated env-batch.json and env-streaming.json files with IP
>>>>>>> address for ES and rebuilt Griffin.
>>>>>>> Still no metrics for the jobs executed.
>>>>>>> ES does have data as confirmed yesterday.
>>>>>>>
>>>>>>> Please help.
>>>>>>>
>>>>>>> Thanks and Regards,
>>>>>>> Sunil Muniyal
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Sep 10, 2020 at 7:41 PM William Guo <[email protected]>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> please enter ip directly.
>>>>>>>> not sure whether hostname can be resolved correctly or not.
>>>>>>>>
>>>>>>>> On Thu, Sep 10, 2020 at 10:06 PM Sunil Muniyal <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>> Hi William,
>>>>>>>>>
>>>>>>>>> Thank you for the reply.
>>>>>>>>>
>>>>>>>>> Regarding points 2 and 3. Possible to share some more details. I
>>>>>>>>> believe the env_batch.json is configured as it is expected. What 
>>>>>>>>> exactly
>>>>>>>>> needs to be updated correctly? ES Hostname or shall I enter IP or 
>>>>>>>>> something
>>>>>>>>> else? Please help.
>>>>>>>>>
>>>>>>>>> Thanks and Regards,
>>>>>>>>> Sunil Muniyal
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Sep 10, 2020 at 7:30 PM William Guo <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> 1 OK, We will fix this issue soon.
>>>>>>>>>> 2 Could you try ping es from your spark environment and input ES
>>>>>>>>>> endpoint correctly in env_batch.json
>>>>>>>>>> 3 Please put your es endpoint in env_batch.json
>>>>>>>>>> 6 Please try the following script to build your env.
>>>>>>>>>> ```
>>>>>>>>>>
>>>>>>>>>> #!/bin/bash
>>>>>>>>>> #create table
>>>>>>>>>> hive -f create-table.hqlecho "create table done"
>>>>>>>>>> #current hoursudo ./gen_demo_data.shcur_date=`date 
>>>>>>>>>> +%Y%m%d%H`dt=${cur_date:0:8}hour=${cur_date:8:2}partition_date="dt='$dt',hour='$hour'"
>>>>>>>>>> sed s/PARTITION_DATE/$partition_date/ ./insert-data.hql.template > 
>>>>>>>>>> insert-data.hql
>>>>>>>>>> hive -f 
>>>>>>>>>> insert-data.hqlsrc_done_path=/griffin/data/batch/demo_src/dt=${dt}/hour=${hour}/_DONEtgt_done_path=/griffin/data/batch/demo_tgt/dt=${dt}/hour=${hour}/_DONE
>>>>>>>>>> hadoop fs -mkdir -p 
>>>>>>>>>> /griffin/data/batch/demo_src/dt=${dt}/hour=${hour}
>>>>>>>>>> hadoop fs -mkdir -p 
>>>>>>>>>> /griffin/data/batch/demo_tgt/dt=${dt}/hour=${hour}
>>>>>>>>>> hadoop fs -touchz ${src_done_path}
>>>>>>>>>> hadoop fs -touchz ${tgt_done_path}echo "insert data 
>>>>>>>>>> [$partition_date] done"
>>>>>>>>>> #last hoursudo ./gen_demo_data.shcur_date=`date -d '1 hour ago' 
>>>>>>>>>> +%Y%m%d%H`dt=${cur_date:0:8}hour=${cur_date:8:2}partition_date="dt='$dt',hour='$hour'"
>>>>>>>>>> sed s/PARTITION_DATE/$partition_date/ ./insert-data.hql.template > 
>>>>>>>>>> insert-data.hql
>>>>>>>>>> hive -f 
>>>>>>>>>> insert-data.hqlsrc_done_path=/griffin/data/batch/demo_src/dt=${dt}/hour=${hour}/_DONEtgt_done_path=/griffin/data/batch/demo_tgt/dt=${dt}/hour=${hour}/_DONE
>>>>>>>>>> hadoop fs -mkdir -p 
>>>>>>>>>> /griffin/data/batch/demo_src/dt=${dt}/hour=${hour}
>>>>>>>>>> hadoop fs -mkdir -p 
>>>>>>>>>> /griffin/data/batch/demo_tgt/dt=${dt}/hour=${hour}
>>>>>>>>>> hadoop fs -touchz ${src_done_path}
>>>>>>>>>> hadoop fs -touchz ${tgt_done_path}echo "insert data 
>>>>>>>>>> [$partition_date] done"
>>>>>>>>>> #next hoursset +ewhile truedo
>>>>>>>>>>   sudo ./gen_demo_data.sh
>>>>>>>>>>   cur_date=`date +%Y%m%d%H`
>>>>>>>>>>   next_date=`date -d "+1hour" '+%Y%m%d%H'`
>>>>>>>>>>   dt=${next_date:0:8}
>>>>>>>>>>   hour=${next_date:8:2}
>>>>>>>>>>   partition_date="dt='$dt',hour='$hour'"
>>>>>>>>>>   sed s/PARTITION_DATE/$partition_date/ ./insert-data.hql.template > 
>>>>>>>>>> insert-data.hql
>>>>>>>>>>   hive -f insert-data.hql
>>>>>>>>>>   
>>>>>>>>>> src_done_path=/griffin/data/batch/demo_src/dt=${dt}/hour=${hour}/_DONE
>>>>>>>>>>   
>>>>>>>>>> tgt_done_path=/griffin/data/batch/demo_tgt/dt=${dt}/hour=${hour}/_DONE
>>>>>>>>>>   hadoop fs -mkdir -p 
>>>>>>>>>> /griffin/data/batch/demo_src/dt=${dt}/hour=${hour}
>>>>>>>>>>   hadoop fs -mkdir -p 
>>>>>>>>>> /griffin/data/batch/demo_tgt/dt=${dt}/hour=${hour}
>>>>>>>>>>   hadoop fs -touchz ${src_done_path}
>>>>>>>>>>   hadoop fs -touchz ${tgt_done_path}
>>>>>>>>>>   echo "insert data [$partition_date] done"
>>>>>>>>>>   sleep 3600doneset -e
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> William
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Sep 10, 2020 at 4:58 PM Sunil Muniyal <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> 1. Since I was able to get ElasticSearch 6.8.x integrated, does
>>>>>>>>>>> it mean that only ES upto 6.8.x is supported for Griffin as of now? 
>>>>>>>>>>> If yes,
>>>>>>>>>>> what are the plans further? Is there a page from which I could get 
>>>>>>>>>>> updates?
>>>>>>>>>>> --please file a jira ticket for us to make our code ES
>>>>>>>>>>> compatible.
>>>>>>>>>>> [SM] GRIFFIN-346 - Support for Elastic Search latest version
>>>>>>>>>>> (7.9.1) <https://issues.apache.org/jira/browse/GRIFFIN-346> is
>>>>>>>>>>> submitted
>>>>>>>>>>>
>>>>>>>>>>> 2. I still do not see the metrics available (please refer below
>>>>>>>>>>> screenshots). Though the measure is now listed in the drop down of 
>>>>>>>>>>> *DQ
>>>>>>>>>>> Metrics* tab. But when I selected the test measure, nothing
>>>>>>>>>>> came up.
>>>>>>>>>>> --could you check the ES whether metrics have been injected or
>>>>>>>>>>> not.
>>>>>>>>>>> [SM] I used the link below and got the index that is created in
>>>>>>>>>>> ES. I believe the data is loaded. However, please correct if I
>>>>>>>>>>> understood incorrectly
>>>>>>>>>>> *"http://<ES Public IP>:9200/_cat/indices?v"*
>>>>>>>>>>> --------------> POC env is on public cloud so using Public IP.
>>>>>>>>>>>
>>>>>>>>>>> health status index   uuid                   pri rep docs.count 
>>>>>>>>>>> docs.deleted store.size pri.store.size
>>>>>>>>>>> yellow open   griffin ur_Kd3XFQBCsPzIM84j87Q   5   2          0     
>>>>>>>>>>>        0      1.2kb          1.2kb
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Docs in the index:* "http://<ES Public
>>>>>>>>>>> IP>:9200/griffin/_search"*
>>>>>>>>>>>
>>>>>>>>>>> {"took":44,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Index Mapping: *"http://<ES Public IP>:9200/griffin"*
>>>>>>>>>>>
>>>>>>>>>>> {"griffin":{"aliases":{},"mappings":{"accuracy":{"properties":{"name":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"tmst":{"type":"date"}}}},"settings":{"index":{"creation_date":"1599567930578","number_of_shards":"5","number_of_replicas":"2","uuid":"ur_Kd3XFQBCsPzIM84j87Q","version":{"created":"6081299"},"provided_name":"griffin"}}}}
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 3. At a step in deployment guide it is suggested to check URL: 
>>>>>>>>>>> "*http://<ES
>>>>>>>>>>> HOST IP>:9200/griffin/accuracy"* When navigated to this URL, I
>>>>>>>>>>> get below error. Please advise
>>>>>>>>>>> *{"error":"Incorrect HTTP method for uri [/griffin/accuracy] and
>>>>>>>>>>> method [GET], allowed: [POST]","status":405}*
>>>>>>>>>>> *-- it seems you need to use POST method.*
>>>>>>>>>>> [SM] I am using the POST method as suggested in the article.
>>>>>>>>>>> Below is the JSON of *env_batch.JSON*
>>>>>>>>>>> *    {*
>>>>>>>>>>> *      "type": "ELASTICSEARCH",*
>>>>>>>>>>> *      "config": {*
>>>>>>>>>>> *        "method": "post",*
>>>>>>>>>>> *        "api": "http://<ES Host Name>:9200/griffin/accuracy",
>>>>>>>>>>> ---------> *do we need IP here?
>>>>>>>>>>> *        "connection.timeout": "1m",*
>>>>>>>>>>> *        "retry": 10*
>>>>>>>>>>> *      }*
>>>>>>>>>>> *    }*
>>>>>>>>>>>
>>>>>>>>>>> 6. I also noticed that in Data Assets, *demo_src* is owned by
>>>>>>>>>>> Admin whereas, *demo-tgt* by root. Would that make any
>>>>>>>>>>> difference? If yes, how to correct it? Reload HIVE data?
>>>>>>>>>>> -- could you show me your script for dataset setup?
>>>>>>>>>>> <https://issues.apache.org/jira/browse/GRIFFIN-346>
>>>>>>>>>>> [SM] Attached are the 3 scripts. gen-hive-data.sh is the master
>>>>>>>>>>> script which triggers demo_data and it further triggers delta_src.
>>>>>>>>>>> Have done it as it is instructed in the Github article and
>>>>>>>>>>> gen-hive-data.sh is triggered as root in the terminal.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Please advise.
>>>>>>>>>>>
>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>> Sunil Muniyal
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Sep 9, 2020 at 8:41 PM William Guo <[email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> *Request you to please advise further on below points:*
>>>>>>>>>>>> 1. Since I was able to get ElasticSearch 6.8.x integrated, does
>>>>>>>>>>>> it mean that only ES upto 6.8.x is supported for Griffin as of 
>>>>>>>>>>>> now? If yes,
>>>>>>>>>>>> what are the plans further? Is there a page from which I could get 
>>>>>>>>>>>> updates?
>>>>>>>>>>>> --please file a jira ticket for us to make our code ES
>>>>>>>>>>>> compatible.
>>>>>>>>>>>>
>>>>>>>>>>>> 2. I still do not see the metrics available (please refer below
>>>>>>>>>>>> screenshots). Though the measure is now listed in the drop down of 
>>>>>>>>>>>> *DQ
>>>>>>>>>>>> Metrics* tab. But when I selected the test measure, nothing
>>>>>>>>>>>> came up.
>>>>>>>>>>>> --could you check the ES whether metrics have been injected or
>>>>>>>>>>>> not.
>>>>>>>>>>>>
>>>>>>>>>>>> 3. At a step in deployment guide it is suggested to check URL: 
>>>>>>>>>>>> http://<ES
>>>>>>>>>>>> HOST IP>:9200/griffin/accuracy
>>>>>>>>>>>> <http://13.126.127.141:9200/griffin/accuracy> When navigated
>>>>>>>>>>>> to this URL, I get below error. Please advise
>>>>>>>>>>>> *{"error":"Incorrect HTTP method for uri [/griffin/accuracy]
>>>>>>>>>>>> and method [GET], allowed: [POST]","status":405}*
>>>>>>>>>>>> *-- it seems you need to use POST method.*
>>>>>>>>>>>>
>>>>>>>>>>>> 6. I also noticed that in Data Assets, *demo_src* is owned by
>>>>>>>>>>>> Admin whereas, *demo-tgt* by root. Would that make any
>>>>>>>>>>>> difference? If yes, how to correct it? Reload HIVE data?
>>>>>>>>>>>>
>>>>>>>>>>>> -- could you show me your script for dataset setup?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Sep 8, 2020 at 9:02 PM Sunil Muniyal <
>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi William,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I was finally able to get Griffin up and ElasticSearch
>>>>>>>>>>>>> integrated along with Hadoop. Thanks a lot for your help and 
>>>>>>>>>>>>> guidance so
>>>>>>>>>>>>> far.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I have created a test measure and a job which gets triggered
>>>>>>>>>>>>> at every 4 mins automatically (have referred to the user guide 
>>>>>>>>>>>>> available on
>>>>>>>>>>>>> GitHub at this link
>>>>>>>>>>>>> <https://github.com/apache/griffin/blob/master/griffin-doc/ui/user-guide.md>
>>>>>>>>>>>>> .)
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Request you to please advise further on below points:*
>>>>>>>>>>>>> 1. Since I was able to get ElasticSearch 6.8.x integrated,
>>>>>>>>>>>>> does it mean that only ES upto 6.8.x is supported for Griffin as 
>>>>>>>>>>>>> of now? If
>>>>>>>>>>>>> yes, what are the plans further? Is there a page from which I 
>>>>>>>>>>>>> could get
>>>>>>>>>>>>> updates?
>>>>>>>>>>>>> 2. I still do not see the metrics available (please refer
>>>>>>>>>>>>> below screenshots). Though the measure is now listed in the drop 
>>>>>>>>>>>>> down of *DQ
>>>>>>>>>>>>> Metrics* tab. But when I selected the test measure, nothing
>>>>>>>>>>>>> came up.
>>>>>>>>>>>>> 3. At a step in deployment guide it is suggested to check URL: 
>>>>>>>>>>>>> http://<ES
>>>>>>>>>>>>> HOST IP>:9200/griffin/accuracy
>>>>>>>>>>>>> <http://13.126.127.141:9200/griffin/accuracy> When navigated
>>>>>>>>>>>>> to this URL, I get below error. Please advise
>>>>>>>>>>>>> *{"error":"Incorrect HTTP method for uri [/griffin/accuracy]
>>>>>>>>>>>>> and method [GET], allowed: [POST]","status":405}*
>>>>>>>>>>>>> 6. I also noticed that in Data Assets, *demo_src* is owned by
>>>>>>>>>>>>> Admin whereas, *demo-tgt* by root. Would that make any
>>>>>>>>>>>>> difference? If yes, how to correct it? Reload HIVE data?
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Screenshots:*
>>>>>>>>>>>>> *Data Assets:*
>>>>>>>>>>>>> [image: image.png]
>>>>>>>>>>>>>
>>>>>>>>>>>>> *DQ Metrics (Test Measure selected):*
>>>>>>>>>>>>> [image: image.png]
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Job Triggered multiple times:*
>>>>>>>>>>>>> [image: image.png]
>>>>>>>>>>>>>
>>>>>>>>>>>>> *Metrics page from job directly:*
>>>>>>>>>>>>> [image: image.png]
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>>> Sunil Muniyal
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Sep 8, 2020 at 4:38 PM Sunil Muniyal <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am unable to get repos for 6.4.1 instead I found 6.8.x.
>>>>>>>>>>>>>> Will try with this version of Elastic Search in sometime.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In the meantime, would it be possible to confirm if 6.4.x or
>>>>>>>>>>>>>> 6.8.x is the only supported version for Griffin? Reason I am 
>>>>>>>>>>>>>> asking is, the
>>>>>>>>>>>>>> GitHub article for griffin deployment points to the latest 
>>>>>>>>>>>>>> version of ES.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>>>> Sunil Muniyal
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Sep 8, 2020 at 4:06 PM Sunil Muniyal <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I will need to redeploy ElasticSearch, correct?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>>>>> Sunil Muniyal
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Sep 8, 2020 at 4:05 PM William Guo <[email protected]>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Could you try with this version?
>>>>>>>>>>>>>>>> <elasticsearch.version>6.4.1</elasticsearch.version>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>> William
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Sep 8, 2020 at 5:59 PM Sunil Muniyal <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi William / Dev group,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I have deployed ES 7.9 - latest version (single node) and
>>>>>>>>>>>>>>>>> the same is configured. I also get the default page when 
>>>>>>>>>>>>>>>>> hitting
>>>>>>>>>>>>>>>>> http://<ES HOST IP>:9200/
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Upon creating the griffin configurations using the JSON
>>>>>>>>>>>>>>>>> string given
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> curl -k -H "Content-Type: application/json" -X PUT 
>>>>>>>>>>>>>>>>> http://<replaced with my ES host IP>:9200/griffin \
>>>>>>>>>>>>>>>>>  -d '{
>>>>>>>>>>>>>>>>>     "aliases": {},
>>>>>>>>>>>>>>>>>     "mappings": {
>>>>>>>>>>>>>>>>>         "accuracy": {
>>>>>>>>>>>>>>>>>             "properties": {
>>>>>>>>>>>>>>>>>                 "name": {
>>>>>>>>>>>>>>>>>                     "fields": {
>>>>>>>>>>>>>>>>>                         "keyword": {
>>>>>>>>>>>>>>>>>                             "ignore_above": 256,
>>>>>>>>>>>>>>>>>                             "type": "keyword"
>>>>>>>>>>>>>>>>>                         }
>>>>>>>>>>>>>>>>>                     },
>>>>>>>>>>>>>>>>>                     "type": "text"
>>>>>>>>>>>>>>>>>                 },
>>>>>>>>>>>>>>>>>                 "tmst": {
>>>>>>>>>>>>>>>>>                     "type": "date"
>>>>>>>>>>>>>>>>>                 }
>>>>>>>>>>>>>>>>>             }
>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>     },
>>>>>>>>>>>>>>>>>     "settings": {
>>>>>>>>>>>>>>>>>         "index": {
>>>>>>>>>>>>>>>>>             "number_of_replicas": "2",
>>>>>>>>>>>>>>>>>             "number_of_shards": "5"
>>>>>>>>>>>>>>>>>         }
>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>> }'
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> *I get below error:*
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> *{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"Root
>>>>>>>>>>>>>>>>> mapping definition has unsupported parameters:  [accuracy :
>>>>>>>>>>>>>>>>> {properties={name={fields={keyword={ignore_above=256, 
>>>>>>>>>>>>>>>>> type=keyword}},
>>>>>>>>>>>>>>>>> type=text},
>>>>>>>>>>>>>>>>> tmst={type=date}}}]"}],"type":"mapper_parsing_exception","reason":"Failed
>>>>>>>>>>>>>>>>> to parse mapping [_doc]: Root mapping definition has 
>>>>>>>>>>>>>>>>> unsupported
>>>>>>>>>>>>>>>>> parameters:  [accuracy :
>>>>>>>>>>>>>>>>> {properties={name={fields={keyword={ignore_above=256, 
>>>>>>>>>>>>>>>>> type=keyword}},
>>>>>>>>>>>>>>>>> type=text},
>>>>>>>>>>>>>>>>> tmst={type=date}}}]","caused_by":{"type":"mapper_parsing_exception","reason":"Root
>>>>>>>>>>>>>>>>> mapping definition has unsupported parameters:  [accuracy :
>>>>>>>>>>>>>>>>> {properties={name={fields={keyword={ignore_above=256, 
>>>>>>>>>>>>>>>>> type=keyword}},
>>>>>>>>>>>>>>>>> type=text}, tmst={type=date}}}]"}},"status":400}*
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Seems like the JSON string is missing some values or is
>>>>>>>>>>>>>>>>> incorrectly provided.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Would be great if you could please help.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>>>>>>> Sunil Muniyal
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Sep 7, 2020 at 8:16 PM Sunil Muniyal <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thank you for the response, William.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I have started preparing for ES deployment and should
>>>>>>>>>>>>>>>>>> attempt the same tomorrow.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In the meantime, I will also wait for the Dev team in
>>>>>>>>>>>>>>>>>> case they have any additional inputs.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>>>>>>>> Sunil Muniyal
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Mon, Sep 7, 2020 at 8:06 PM William Guo <
>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> If dev confirms it to be mandatory, as I understand
>>>>>>>>>>>>>>>>>>> correct, I will need to:
>>>>>>>>>>>>>>>>>>> 1. Deploy and Configure ES
>>>>>>>>>>>>>>>>>>> 2. Update application.properties to include ES details
>>>>>>>>>>>>>>>>>>> and create ES index
>>>>>>>>>>>>>>>>>>> 3. Rebuild Maven package and rerun the Griffin service
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> *Right, you need to package es env configuration into
>>>>>>>>>>>>>>>>>>> your jar.*
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> There is no need to reload the data into Hadoop (Hive),
>>>>>>>>>>>>>>>>>>> correct?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> *No*
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On a side note, is there any other documentation of
>>>>>>>>>>>>>>>>>>> Griffin available or underway which would help to get below 
>>>>>>>>>>>>>>>>>>> details while
>>>>>>>>>>>>>>>>>>> integrating it with Cloudera Hadoop?
>>>>>>>>>>>>>>>>>>> 1. What are the exact ports requirements (internal and
>>>>>>>>>>>>>>>>>>> external)?
>>>>>>>>>>>>>>>>>>> *check log and make sure all extra connections in
>>>>>>>>>>>>>>>>>>> properties can accessible*
>>>>>>>>>>>>>>>>>>> 2. Which all packages will be required?
>>>>>>>>>>>>>>>>>>> *no*
>>>>>>>>>>>>>>>>>>> 3. Any Java dependencies?
>>>>>>>>>>>>>>>>>>> *java 1.8*
>>>>>>>>>>>>>>>>>>> 4. If we have Cloudera Hadoop cluster kerberized
>>>>>>>>>>>>>>>>>>> (secured), what are the dependencies or additional 
>>>>>>>>>>>>>>>>>>> configurations needed?
>>>>>>>>>>>>>>>>>>> *Should no extra dependencies, except those transitive
>>>>>>>>>>>>>>>>>>> dependencies incurred by spark and hadoop.*
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Mon, Sep 7, 2020 at 6:42 PM Sunil Muniyal <
>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Ohh ok.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> If dev confirms it to be mandatory, as I understand
>>>>>>>>>>>>>>>>>>>> correct, I will need to:
>>>>>>>>>>>>>>>>>>>> 1. Deploy and Configure ES
>>>>>>>>>>>>>>>>>>>> 2. Update application.properties to include ES details
>>>>>>>>>>>>>>>>>>>> and create ES index
>>>>>>>>>>>>>>>>>>>> 3. Rebuild Maven package and rerun the Griffin service
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> There is no need to reload the data into Hadoop (Hive),
>>>>>>>>>>>>>>>>>>>> correct?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On a side note, is there any other documentation of
>>>>>>>>>>>>>>>>>>>> Griffin available or underway which would help to get 
>>>>>>>>>>>>>>>>>>>> below details while
>>>>>>>>>>>>>>>>>>>> integrating it with Cloudera Hadoop?
>>>>>>>>>>>>>>>>>>>> 1. What are the exact ports requirements (internal and
>>>>>>>>>>>>>>>>>>>> external)?
>>>>>>>>>>>>>>>>>>>> 2. Which all packages will be required?
>>>>>>>>>>>>>>>>>>>> 3. Any Java dependencies?
>>>>>>>>>>>>>>>>>>>> 4. If we have Cloudera Hadoop cluster kerberized
>>>>>>>>>>>>>>>>>>>> (secured), what are the dependencies or additional 
>>>>>>>>>>>>>>>>>>>> configurations needed?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I know some of the above information can be fetched
>>>>>>>>>>>>>>>>>>>> from the deployment guide on Github. However, checking if 
>>>>>>>>>>>>>>>>>>>> any other formal
>>>>>>>>>>>>>>>>>>>> documentation has been made available for the same?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>>>>>>>>>> Sunil Muniyal
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Mon, Sep 7, 2020 at 4:05 PM William Guo <
>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> cc dev for double checking.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Measure will emit metrics and store them in elastic,
>>>>>>>>>>>>>>>>>>>>> UI fetch those metrics from elastic.
>>>>>>>>>>>>>>>>>>>>> So elastic should be mandatory.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>> William
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 7, 2020 at 6:32 PM Sunil Muniyal <
>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thank you for the quick response, William.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I have not configured ElasticSearch since it is not
>>>>>>>>>>>>>>>>>>>>>> deployed.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> In the application.properties, I just added the dummy
>>>>>>>>>>>>>>>>>>>>>> information (as below) just to pass the validation test 
>>>>>>>>>>>>>>>>>>>>>> and get Griffin up
>>>>>>>>>>>>>>>>>>>>>> and running.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> # elasticsearch
>>>>>>>>>>>>>>>>>>>>>> # elasticsearch.host = <IP>
>>>>>>>>>>>>>>>>>>>>>> # elasticsearch.port = <elasticsearch rest port>
>>>>>>>>>>>>>>>>>>>>>> # elasticsearch.user = user
>>>>>>>>>>>>>>>>>>>>>> # elasticsearch.password = password
>>>>>>>>>>>>>>>>>>>>>> elasticsearch.host=localhost
>>>>>>>>>>>>>>>>>>>>>> elasticsearch.port=9200
>>>>>>>>>>>>>>>>>>>>>> elasticsearch.scheme=http
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Is ElasticSearch a mandatory requirement to use
>>>>>>>>>>>>>>>>>>>>>> Griffin?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>>>>>>>>>>>> Sunil Muniyal
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 7, 2020 at 3:58 PM William Guo <
>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Could you check whether ES has been injected with
>>>>>>>>>>>>>>>>>>>>>>> those metrics or not?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 7, 2020 at 6:23 PM Sunil Muniyal <
>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Hello William,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I was able to bypass this error by entering the
>>>>>>>>>>>>>>>>>>>>>>>> default field values for LDAP, ElasticSearch and Livy 
>>>>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>>> application.properties and successfully get Griffin 
>>>>>>>>>>>>>>>>>>>>>>>> running.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> By following the below article, I have created a
>>>>>>>>>>>>>>>>>>>>>>>> test measure and then a job which triggers that 
>>>>>>>>>>>>>>>>>>>>>>>> measure.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/griffin/blob/master/griffin-doc/ui/user-guide.md
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Have allowed the job to get triggered multiple
>>>>>>>>>>>>>>>>>>>>>>>> times, however, still i can't see anything in metrics 
>>>>>>>>>>>>>>>>>>>>>>>> related to the job.
>>>>>>>>>>>>>>>>>>>>>>>> Neither I see anything in *health *or *mydashboard* 
>>>>>>>>>>>>>>>>>>>>>>>> tabs.
>>>>>>>>>>>>>>>>>>>>>>>> Also, if you notice in the screenshot below, being in 
>>>>>>>>>>>>>>>>>>>>>>>> the *DQ
>>>>>>>>>>>>>>>>>>>>>>>> Metrics* tab, I still do not see the created
>>>>>>>>>>>>>>>>>>>>>>>> measure in the drop down list.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> [image: image.png]
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> *Test job executed multiple times:*
>>>>>>>>>>>>>>>>>>>>>>>> [image: image.png]
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Please advise if anything is mis-configured.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>>>>>>>>>>>>>> Sunil Muniyal
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 7, 2020 at 12:40 PM Sunil Muniyal <
>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Hello William,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Thank you for the reply.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> This helped, actually i had missed to add the
>>>>>>>>>>>>>>>>>>>>>>>>> property in application.properties.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Now the other challenge is, along with ES and
>>>>>>>>>>>>>>>>>>>>>>>>> Livy, I am also not using LDAP and it is hitting the 
>>>>>>>>>>>>>>>>>>>>>>>>> error *unable
>>>>>>>>>>>>>>>>>>>>>>>>> to resolve ldap.url property.* Of Course it will,
>>>>>>>>>>>>>>>>>>>>>>>>> since the property is not configured.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Please suggest.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>>>>>>>>>>>>>>> Sunil Muniyal
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Sun, Sep 6, 2020 at 7:26 PM William Guo <
>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> hi Sunil Muniyal,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Could you check this property in your griffin
>>>>>>>>>>>>>>>>>>>>>>>>>> properties file?
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> internal.event.listeners
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> William
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 3, 2020 at 11:05 PM Sunil Muniyal <
>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> I am attempting to integrate Griffin with
>>>>>>>>>>>>>>>>>>>>>>>>>>> Cloudera Hadoop by following below article:
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/griffin/blob/master/griffin-doc/deploy/deploy-guide.md
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> <https://github.com/apache/griffin/blob/master/griffin-doc/deploy/deploy-guide.md>I
>>>>>>>>>>>>>>>>>>>>>>>>>>> have followed everything as instructed, apart from 
>>>>>>>>>>>>>>>>>>>>>>>>>>> below things:
>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. Using Cloudera Hadoop 5.15 and relevant
>>>>>>>>>>>>>>>>>>>>>>>>>>> configurations instead of Apache Hadoop
>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. Not using Elastic search as it is not
>>>>>>>>>>>>>>>>>>>>>>>>>>> applicable
>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. Did not use Livy as it is not applicable.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Maven build is successful and has got 2 jars at
>>>>>>>>>>>>>>>>>>>>>>>>>>> service/target and measure/target which I have 
>>>>>>>>>>>>>>>>>>>>>>>>>>> uploaded to HDFS.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> However, *starting griffin-service.jar using
>>>>>>>>>>>>>>>>>>>>>>>>>>> nohup command* is failing with below error:
>>>>>>>>>>>>>>>>>>>>>>>>>>> *Caused by: java.lang.IllegalArgumentException:
>>>>>>>>>>>>>>>>>>>>>>>>>>> Could not resolve placeholder 
>>>>>>>>>>>>>>>>>>>>>>>>>>> 'internal.event.listeners' in string value
>>>>>>>>>>>>>>>>>>>>>>>>>>> "#{'${internal.event.listeners}'.split(',')}"*
>>>>>>>>>>>>>>>>>>>>>>>>>>> *        at
>>>>>>>>>>>>>>>>>>>>>>>>>>> org.springframework.util.PropertyPlaceholderHelper.parseStringValue(PropertyPlaceholderHelper.java:174)
>>>>>>>>>>>>>>>>>>>>>>>>>>> ~[spring-core-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]*
>>>>>>>>>>>>>>>>>>>>>>>>>>> *        at
>>>>>>>>>>>>>>>>>>>>>>>>>>> org.springframework.util.PropertyPlaceholderHelper.replacePlaceholders(PropertyPlaceholderHelper.java:126)
>>>>>>>>>>>>>>>>>>>>>>>>>>> ~[spring-core-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]*
>>>>>>>>>>>>>>>>>>>>>>>>>>> *        at
>>>>>>>>>>>>>>>>>>>>>>>>>>> org.springframework.core.env.AbstractPropertyResolver.doResolvePlaceholders(AbstractPropertyResolver.java:236)
>>>>>>>>>>>>>>>>>>>>>>>>>>> ~[spring-core-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]*
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> I have tried to search a lot of articles with no
>>>>>>>>>>>>>>>>>>>>>>>>>>> luck.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Would be great if someone could help me to fix
>>>>>>>>>>>>>>>>>>>>>>>>>>> this.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Also, attached is the output of nohup command
>>>>>>>>>>>>>>>>>>>>>>>>>>> that was written in service.out.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>>>>>>>>>>>>>>>>> Sunil Muniyal
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>>>>>>>>>>>>>>>> To unsubscribe, e-mail:
>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>>>>>>>> For additional commands, e-mail:
>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>>>>>>> For additional commands, e-mail: [email protected]
>>>>>>>>>>
>>>>>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: [email protected]
>>>>> For additional commands, e-mail: [email protected]
>>>>
>>>>

Re: Griffin Integration with Cloudera Hadoop POC

Reply via email to