Got below message as output of
{"Test_Measure":[{"name":"Test_Job","type":"ACCURACY","owner":"test","metricValues":[]}]}
metricValues seems empty. So is it like Griffin is not getting data from
ES? whereas ES does have the data which we verified previously. By any
chance, do you think not having Livy could be a problem?
These are the latest logs from service.out:
*[EL Fine]: sql: 2020-09-11
11:59:11.662--ServerSession(400064818)--Connection(754936662)--SELECT
DISTINCT ID, APPID, APPURI, CREATEDDATE, DELETED, expire_timestamp,
MODIFIEDDATE, predicate_job_deleted, predicate_group_name,
predicate_job_name, SESSIONID, STATE, timestamp, TYPE, job_id FROM
JOBINSTANCEBEAN WHERE (STATE IN (?,?,?,?,?,?))*
* bind => [6 parameters bound]*
*[EL Fine]: sql: 2020-09-11
11:59:51.044--ServerSession(400064818)--Connection(353930083)--SELECT ID,
type, CREATEDDATE, CRONEXPRESSION, DELETED, quartz_group_name, JOBNAME,
MEASUREID, METRICNAME, MODIFIEDDATE, quartz_job_name, PREDICATECONFIG,
TIMEZONE FROM job WHERE (DELETED = ?)*
* bind => [1 parameter bound]*
*[EL Fine]: sql: 2020-09-11
11:59:51.046--ServerSession(400064818)--Connection(1245663749)--SELECT
DISTINCT DTYPE FROM MEASURE WHERE (DELETED = ?)*
* bind => [1 parameter bound]*
*[EL Fine]: sql: 2020-09-11
11:59:51.046--ServerSession(400064818)--Connection(674248356)--SELECT
t0.ID, t0.DTYPE, t0.CREATEDDATE, t0.DELETED, t0.DESCRIPTION, t0.DQTYPE,
t0.MODIFIEDDATE, t0.NAME, t0.ORGANIZATION, t0.OWNER, t0.SINKS, t1.ID,
t1.PROCESSTYPE, t1.RULEDESCRIPTION, t1.evaluate_rule_id FROM MEASURE t0,
GRIFFINMEASURE t1 WHERE ((t0.DELETED = ?) AND ((t1.ID = t0.ID) AND
(t0.DTYPE = ?)))*
* bind => [2 parameters bound]*
*[EL Fine]: sql: 2020-09-11
12:00:00.019--ClientSession(294162678)--Connection(98503327)--INSERT INTO
JOBINSTANCEBEAN (ID, APPID, APPURI, CREATEDDATE, DELETED, expire_timestamp,
MODIFIEDDATE, predicate_job_deleted, predicate_group_name,
predicate_job_name, SESSIONID, STATE, timestamp, TYPE, job_id) VALUES (?,
?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)*
* bind => [15 parameters bound]*
*[EL Fine]: sql: 2020-09-11
12:00:00.09--ServerSession(400064818)--Connection(491395630)--SELECT ID,
APPID, APPURI, CREATEDDATE, DELETED, expire_timestamp, MODIFIEDDATE,
predicate_job_deleted, predicate_group_name, predicate_job_name, SESSIONID,
STATE, timestamp, TYPE, job_id FROM JOBINSTANCEBEAN WHERE
(predicate_job_name = ?)*
* bind => [1 parameter bound]*
*2020-09-11 12:00:00.117 INFO 10980 --- [ryBean_Worker-3]
o.a.g.c.j.SparkSubmitJob : {*
* "measure.type" : "griffin",*
* "id" : 201,*
* "name" : "Test_Job",*
* "owner" : "test",*
* "description" : "Measure to check %age of id field values are same",*
* "deleted" : false,*
* "timestamp" : 1599822000000,*
* "dq.type" : "ACCURACY",*
* "sinks" : [ "ELASTICSEARCH", "HDFS" ],*
* "process.type" : "BATCH",*
* "data.sources" : [ {*
* "id" : 204,*
* "name" : "source",*
* "connectors" : [ {*
* "id" : 205,*
* "name" : "source1599568886803",*
* "type" : "HIVE",*
* "version" : "1.2",*
* "predicates" : [ ],*
* "data.unit" : "1hour",*
* "data.time.zone" : "",*
* "config" : {*
* "database" : "default",*
* "table.name <http://table.name>" : "demo_src",*
* "where" : "dt=20200911 AND hour=11"*
* }*
* } ],*
* "baseline" : false*
* }, {*
* "id" : 206,*
* "name" : "target",*
* "connectors" : [ {*
* "id" : 207,*
* "name" : "target1599568896874",*
* "type" : "HIVE",*
* "version" : "1.2",*
* "predicates" : [ ],*
* "data.unit" : "1hour",*
* "data.time.zone" : "",*
* "config" : {*
* "database" : "default",*
* "table.name <http://table.name>" : "demo_tgt",*
* "where" : "dt=20200911 AND hour=11"*
* }*
* } ],*
* "baseline" : false*
* } ],*
* "evaluate.rule" : {*
* "id" : 202,*
* "rules" : [ {*
* "id" : 203,*
* "rule" : "source.id <http://source.id>=target.id
<http://target.id>",*
* "dsl.type" : "griffin-dsl",*
* "dq.type" : "ACCURACY",*
* "out.dataframe.name <http://out.dataframe.name>" : "accuracy"*
* } ]*
* },*
* "measure.type" : "griffin"*
*}*
*2020-09-11 12:00:00.119 ERROR 10980 --- [ryBean_Worker-3]
o.a.g.c.j.SparkSubmitJob : Post to livy ERROR. I/O error on
POST request for "http://localhost:8998/batches
<http://localhost:8998/batches>": Connection refused (Connection refused);
nested exception is java.net.ConnectException: Connection refused
(Connection refused)*
*2020-09-11 12:00:00.131 INFO 10980 --- [ryBean_Worker-3]
o.a.g.c.j.SparkSubmitJob : Delete predicate
job(PG,Test_Job_predicate_1599825600016) SUCCESS.*
*[EL Fine]: sql: 2020-09-11
12:00:00.133--ClientSession(273634815)--Connection(296858203)--UPDATE
JOBINSTANCEBEAN SET predicate_job_deleted = ?, STATE = ? WHERE (ID = ?)*
* bind => [3 parameters bound]*
*[EL Fine]: sql: 2020-09-11
12:00:11.664--ServerSession(400064818)--Connection(1735064739)--SELECT
DISTINCT ID, APPID, APPURI, CREATEDDATE, DELETED, expire_timestamp,
MODIFIEDDATE, predicate_job_deleted, predicate_group_name,
predicate_job_name, SESSIONID, STATE, timestamp, TYPE, job_id FROM
JOBINSTANCEBEAN WHERE (STATE IN (?,?,?,?,?,?))*
* bind => [6 parameters bound]*
Thanks and Regards,
Sunil Muniyal
On Fri, Sep 11, 2020 at 3:42 PM William Guo <[email protected]> wrote:
> From the log, I didn't find any information related to metrics fetching.
>
> Could you try to call /api/v1/metrics, and show us the latest log again?
>
>
>
> On Fri, Sep 11, 2020 at 5:48 PM Sunil Muniyal <[email protected]>
> wrote:
>
>> 1: I guest it is related to your login user and super user.
>> I am less worried about unless this could be the cause of metrics not
>> being displayed.
>>
>> 2: Could you share with us your griffin log , I suspect some exception
>> happened when trying to connect with ES.
>> Attached is the service.out file. I see an error is while submitting
>> Spark jobs via Livy. Since Livy is not configured / deployed this is
>> expected. I believe this should not be the reason since we are getting data
>> from hive (as part of batch processing). Please correct if my understanding
>> is incorrect.
>>
>> Thanks and Regards,
>> Sunil Muniyal
>>
>>
>> On Fri, Sep 11, 2020 at 3:09 PM William Guo <[email protected]> wrote:
>>
>>> 1: I guest it is related to your login user and super user.
>>> 2: Could you share with us your griffin log , I suspect some exception
>>> happened when try to connect with ES.
>>>
>>> On Fri, Sep 11, 2020 at 5:14 PM Sunil Muniyal <
>>> [email protected]> wrote:
>>>
>>>> Hello William,
>>>>
>>>> Tried as suggested.
>>>>
>>>> 1. Ingested data into Hive tables using the provided script.
>>>> The ownership still show as is (Source with Admin and Target with Root)
>>>>
>>>> 2. Updated env-batch.json and env-streaming.json files with IP address
>>>> for ES and rebuilt Griffin.
>>>> Still no metrics for the jobs executed.
>>>> ES does have data as confirmed yesterday.
>>>>
>>>> Please help.
>>>>
>>>> Thanks and Regards,
>>>> Sunil Muniyal
>>>>
>>>>
>>>> On Thu, Sep 10, 2020 at 7:41 PM William Guo <[email protected]> wrote:
>>>>
>>>>> please enter ip directly.
>>>>> not sure whether hostname can be resolved correctly or not.
>>>>>
>>>>> On Thu, Sep 10, 2020 at 10:06 PM Sunil Muniyal <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi William,
>>>>>>
>>>>>> Thank you for the reply.
>>>>>>
>>>>>> Regarding points 2 and 3. Possible to share some more details. I
>>>>>> believe the env_batch.json is configured as it is expected. What exactly
>>>>>> needs to be updated correctly? ES Hostname or shall I enter IP or
>>>>>> something
>>>>>> else? Please help.
>>>>>>
>>>>>> Thanks and Regards,
>>>>>> Sunil Muniyal
>>>>>>
>>>>>>
>>>>>> On Thu, Sep 10, 2020 at 7:30 PM William Guo <[email protected]> wrote:
>>>>>>
>>>>>>> 1 OK, We will fix this issue soon.
>>>>>>> 2 Could you try ping es from your spark environment and input ES
>>>>>>> endpoint correctly in env_batch.json
>>>>>>> 3 Please put your es endpoint in env_batch.json
>>>>>>> 6 Please try the following script to build your env.
>>>>>>> ```
>>>>>>>
>>>>>>> #!/bin/bash
>>>>>>> #create table
>>>>>>> hive -f create-table.hqlecho "create table done"
>>>>>>> #current hoursudo ./gen_demo_data.shcur_date=`date
>>>>>>> +%Y%m%d%H`dt=${cur_date:0:8}hour=${cur_date:8:2}partition_date="dt='$dt',hour='$hour'"
>>>>>>> sed s/PARTITION_DATE/$partition_date/ ./insert-data.hql.template >
>>>>>>> insert-data.hql
>>>>>>> hive -f
>>>>>>> insert-data.hqlsrc_done_path=/griffin/data/batch/demo_src/dt=${dt}/hour=${hour}/_DONEtgt_done_path=/griffin/data/batch/demo_tgt/dt=${dt}/hour=${hour}/_DONE
>>>>>>> hadoop fs -mkdir -p /griffin/data/batch/demo_src/dt=${dt}/hour=${hour}
>>>>>>> hadoop fs -mkdir -p /griffin/data/batch/demo_tgt/dt=${dt}/hour=${hour}
>>>>>>> hadoop fs -touchz ${src_done_path}
>>>>>>> hadoop fs -touchz ${tgt_done_path}echo "insert data [$partition_date]
>>>>>>> done"
>>>>>>> #last hoursudo ./gen_demo_data.shcur_date=`date -d '1 hour ago'
>>>>>>> +%Y%m%d%H`dt=${cur_date:0:8}hour=${cur_date:8:2}partition_date="dt='$dt',hour='$hour'"
>>>>>>> sed s/PARTITION_DATE/$partition_date/ ./insert-data.hql.template >
>>>>>>> insert-data.hql
>>>>>>> hive -f
>>>>>>> insert-data.hqlsrc_done_path=/griffin/data/batch/demo_src/dt=${dt}/hour=${hour}/_DONEtgt_done_path=/griffin/data/batch/demo_tgt/dt=${dt}/hour=${hour}/_DONE
>>>>>>> hadoop fs -mkdir -p /griffin/data/batch/demo_src/dt=${dt}/hour=${hour}
>>>>>>> hadoop fs -mkdir -p /griffin/data/batch/demo_tgt/dt=${dt}/hour=${hour}
>>>>>>> hadoop fs -touchz ${src_done_path}
>>>>>>> hadoop fs -touchz ${tgt_done_path}echo "insert data [$partition_date]
>>>>>>> done"
>>>>>>> #next hoursset +ewhile truedo
>>>>>>> sudo ./gen_demo_data.sh
>>>>>>> cur_date=`date +%Y%m%d%H`
>>>>>>> next_date=`date -d "+1hour" '+%Y%m%d%H'`
>>>>>>> dt=${next_date:0:8}
>>>>>>> hour=${next_date:8:2}
>>>>>>> partition_date="dt='$dt',hour='$hour'"
>>>>>>> sed s/PARTITION_DATE/$partition_date/ ./insert-data.hql.template >
>>>>>>> insert-data.hql
>>>>>>> hive -f insert-data.hql
>>>>>>> src_done_path=/griffin/data/batch/demo_src/dt=${dt}/hour=${hour}/_DONE
>>>>>>> tgt_done_path=/griffin/data/batch/demo_tgt/dt=${dt}/hour=${hour}/_DONE
>>>>>>> hadoop fs -mkdir -p /griffin/data/batch/demo_src/dt=${dt}/hour=${hour}
>>>>>>> hadoop fs -mkdir -p /griffin/data/batch/demo_tgt/dt=${dt}/hour=${hour}
>>>>>>> hadoop fs -touchz ${src_done_path}
>>>>>>> hadoop fs -touchz ${tgt_done_path}
>>>>>>> echo "insert data [$partition_date] done"
>>>>>>> sleep 3600doneset -e
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> William
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Sep 10, 2020 at 4:58 PM Sunil Muniyal <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> 1. Since I was able to get ElasticSearch 6.8.x integrated, does it
>>>>>>>> mean that only ES upto 6.8.x is supported for Griffin as of now? If
>>>>>>>> yes,
>>>>>>>> what are the plans further? Is there a page from which I could get
>>>>>>>> updates?
>>>>>>>> --please file a jira ticket for us to make our code ES compatible.
>>>>>>>> [SM] GRIFFIN-346 - Support for Elastic Search latest version
>>>>>>>> (7.9.1) <https://issues.apache.org/jira/browse/GRIFFIN-346> is
>>>>>>>> submitted
>>>>>>>>
>>>>>>>> 2. I still do not see the metrics available (please refer below
>>>>>>>> screenshots). Though the measure is now listed in the drop down of *DQ
>>>>>>>> Metrics* tab. But when I selected the test measure, nothing came
>>>>>>>> up.
>>>>>>>> --could you check the ES whether metrics have been injected or not.
>>>>>>>> [SM] I used the link below and got the index that is created in ES.
>>>>>>>> I believe the data is loaded. However, please correct if I
>>>>>>>> understood incorrectly
>>>>>>>> *"http://<ES Public IP>:9200/_cat/indices?v"*
>>>>>>>> --------------> POC env is on public cloud so using Public IP.
>>>>>>>>
>>>>>>>> health status index uuid pri rep docs.count
>>>>>>>> docs.deleted store.size pri.store.size
>>>>>>>> yellow open griffin ur_Kd3XFQBCsPzIM84j87Q 5 2 0
>>>>>>>> 0 1.2kb 1.2kb
>>>>>>>>
>>>>>>>>
>>>>>>>> Docs in the index:* "http://<ES Public IP>:9200/griffin/_search"*
>>>>>>>>
>>>>>>>> {"took":44,"timed_out":false,"_shards":{"total":5,"successful":5,"skipped":0,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}
>>>>>>>>
>>>>>>>>
>>>>>>>> Index Mapping: *"http://<ES Public IP>:9200/griffin"*
>>>>>>>>
>>>>>>>> {"griffin":{"aliases":{},"mappings":{"accuracy":{"properties":{"name":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"tmst":{"type":"date"}}}},"settings":{"index":{"creation_date":"1599567930578","number_of_shards":"5","number_of_replicas":"2","uuid":"ur_Kd3XFQBCsPzIM84j87Q","version":{"created":"6081299"},"provided_name":"griffin"}}}}
>>>>>>>>
>>>>>>>>
>>>>>>>> 3. At a step in deployment guide it is suggested to check URL:
>>>>>>>> "*http://<ES
>>>>>>>> HOST IP>:9200/griffin/accuracy"* When navigated to this URL, I get
>>>>>>>> below error. Please advise
>>>>>>>> *{"error":"Incorrect HTTP method for uri [/griffin/accuracy] and
>>>>>>>> method [GET], allowed: [POST]","status":405}*
>>>>>>>> *-- it seems you need to use POST method.*
>>>>>>>> [SM] I am using the POST method as suggested in the article. Below
>>>>>>>> is the JSON of *env_batch.JSON*
>>>>>>>> * {*
>>>>>>>> * "type": "ELASTICSEARCH",*
>>>>>>>> * "config": {*
>>>>>>>> * "method": "post",*
>>>>>>>> * "api": "http://<ES Host Name>:9200/griffin/accuracy",
>>>>>>>> ---------> *do we need IP here?
>>>>>>>> * "connection.timeout": "1m",*
>>>>>>>> * "retry": 10*
>>>>>>>> * }*
>>>>>>>> * }*
>>>>>>>>
>>>>>>>> 6. I also noticed that in Data Assets, *demo_src* is owned by
>>>>>>>> Admin whereas, *demo-tgt* by root. Would that make any difference?
>>>>>>>> If yes, how to correct it? Reload HIVE data?
>>>>>>>> -- could you show me your script for dataset setup?
>>>>>>>> <https://issues.apache.org/jira/browse/GRIFFIN-346>
>>>>>>>> [SM] Attached are the 3 scripts. gen-hive-data.sh is the master
>>>>>>>> script which triggers demo_data and it further triggers delta_src.
>>>>>>>> Have done it as it is instructed in the Github article and
>>>>>>>> gen-hive-data.sh is triggered as root in the terminal.
>>>>>>>>
>>>>>>>>
>>>>>>>> Please advise.
>>>>>>>>
>>>>>>>> Thanks and Regards,
>>>>>>>> Sunil Muniyal
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Sep 9, 2020 at 8:41 PM William Guo <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> *Request you to please advise further on below points:*
>>>>>>>>> 1. Since I was able to get ElasticSearch 6.8.x integrated, does it
>>>>>>>>> mean that only ES upto 6.8.x is supported for Griffin as of now? If
>>>>>>>>> yes,
>>>>>>>>> what are the plans further? Is there a page from which I could get
>>>>>>>>> updates?
>>>>>>>>> --please file a jira ticket for us to make our code ES compatible.
>>>>>>>>>
>>>>>>>>> 2. I still do not see the metrics available (please refer below
>>>>>>>>> screenshots). Though the measure is now listed in the drop down of *DQ
>>>>>>>>> Metrics* tab. But when I selected the test measure, nothing came
>>>>>>>>> up.
>>>>>>>>> --could you check the ES whether metrics have been injected or not.
>>>>>>>>>
>>>>>>>>> 3. At a step in deployment guide it is suggested to check URL:
>>>>>>>>> http://<ES
>>>>>>>>> HOST IP>:9200/griffin/accuracy
>>>>>>>>> <http://13.126.127.141:9200/griffin/accuracy> When navigated to
>>>>>>>>> this URL, I get below error. Please advise
>>>>>>>>> *{"error":"Incorrect HTTP method for uri [/griffin/accuracy] and
>>>>>>>>> method [GET], allowed: [POST]","status":405}*
>>>>>>>>> *-- it seems you need to use POST method.*
>>>>>>>>>
>>>>>>>>> 6. I also noticed that in Data Assets, *demo_src* is owned by
>>>>>>>>> Admin whereas, *demo-tgt* by root. Would that make any
>>>>>>>>> difference? If yes, how to correct it? Reload HIVE data?
>>>>>>>>>
>>>>>>>>> -- could you show me your script for dataset setup?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Sep 8, 2020 at 9:02 PM Sunil Muniyal <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>> Hi William,
>>>>>>>>>>
>>>>>>>>>> I was finally able to get Griffin up and ElasticSearch integrated
>>>>>>>>>> along with Hadoop. Thanks a lot for your help and guidance so far.
>>>>>>>>>>
>>>>>>>>>> I have created a test measure and a job which gets triggered at
>>>>>>>>>> every 4 mins automatically (have referred to the user guide
>>>>>>>>>> available on
>>>>>>>>>> GitHub at this link
>>>>>>>>>> <https://github.com/apache/griffin/blob/master/griffin-doc/ui/user-guide.md>
>>>>>>>>>> .)
>>>>>>>>>>
>>>>>>>>>> *Request you to please advise further on below points:*
>>>>>>>>>> 1. Since I was able to get ElasticSearch 6.8.x integrated, does
>>>>>>>>>> it mean that only ES upto 6.8.x is supported for Griffin as of now?
>>>>>>>>>> If yes,
>>>>>>>>>> what are the plans further? Is there a page from which I could get
>>>>>>>>>> updates?
>>>>>>>>>> 2. I still do not see the metrics available (please refer below
>>>>>>>>>> screenshots). Though the measure is now listed in the drop down of
>>>>>>>>>> *DQ
>>>>>>>>>> Metrics* tab. But when I selected the test measure, nothing came
>>>>>>>>>> up.
>>>>>>>>>> 3. At a step in deployment guide it is suggested to check URL:
>>>>>>>>>> http://<ES
>>>>>>>>>> HOST IP>:9200/griffin/accuracy
>>>>>>>>>> <http://13.126.127.141:9200/griffin/accuracy> When navigated to
>>>>>>>>>> this URL, I get below error. Please advise
>>>>>>>>>> *{"error":"Incorrect HTTP method for uri [/griffin/accuracy] and
>>>>>>>>>> method [GET], allowed: [POST]","status":405}*
>>>>>>>>>> 6. I also noticed that in Data Assets, *demo_src* is owned by
>>>>>>>>>> Admin whereas, *demo-tgt* by root. Would that make any
>>>>>>>>>> difference? If yes, how to correct it? Reload HIVE data?
>>>>>>>>>>
>>>>>>>>>> *Screenshots:*
>>>>>>>>>> *Data Assets:*
>>>>>>>>>> [image: image.png]
>>>>>>>>>>
>>>>>>>>>> *DQ Metrics (Test Measure selected):*
>>>>>>>>>> [image: image.png]
>>>>>>>>>>
>>>>>>>>>> *Job Triggered multiple times:*
>>>>>>>>>> [image: image.png]
>>>>>>>>>>
>>>>>>>>>> *Metrics page from job directly:*
>>>>>>>>>> [image: image.png]
>>>>>>>>>>
>>>>>>>>>> Thanks and Regards,
>>>>>>>>>> Sunil Muniyal
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Sep 8, 2020 at 4:38 PM Sunil Muniyal <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> I am unable to get repos for 6.4.1 instead I found 6.8.x. Will
>>>>>>>>>>> try with this version of Elastic Search in sometime.
>>>>>>>>>>>
>>>>>>>>>>> In the meantime, would it be possible to confirm if 6.4.x or
>>>>>>>>>>> 6.8.x is the only supported version for Griffin? Reason I am asking
>>>>>>>>>>> is, the
>>>>>>>>>>> GitHub article for griffin deployment points to the latest version
>>>>>>>>>>> of ES.
>>>>>>>>>>>
>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>> Sunil Muniyal
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Sep 8, 2020 at 4:06 PM Sunil Muniyal <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I will need to redeploy ElasticSearch, correct?
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>> Sunil Muniyal
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Sep 8, 2020 at 4:05 PM William Guo <[email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Could you try with this version?
>>>>>>>>>>>>> <elasticsearch.version>6.4.1</elasticsearch.version>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> William
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Sep 8, 2020 at 5:59 PM Sunil Muniyal <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi William / Dev group,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I have deployed ES 7.9 - latest version (single node) and the
>>>>>>>>>>>>>> same is configured. I also get the default page when hitting
>>>>>>>>>>>>>> http://<ES HOST IP>:9200/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Upon creating the griffin configurations using the JSON
>>>>>>>>>>>>>> string given
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> curl -k -H "Content-Type: application/json" -X PUT
>>>>>>>>>>>>>> http://<replaced with my ES host IP>:9200/griffin \
>>>>>>>>>>>>>> -d '{
>>>>>>>>>>>>>> "aliases": {},
>>>>>>>>>>>>>> "mappings": {
>>>>>>>>>>>>>> "accuracy": {
>>>>>>>>>>>>>> "properties": {
>>>>>>>>>>>>>> "name": {
>>>>>>>>>>>>>> "fields": {
>>>>>>>>>>>>>> "keyword": {
>>>>>>>>>>>>>> "ignore_above": 256,
>>>>>>>>>>>>>> "type": "keyword"
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> },
>>>>>>>>>>>>>> "type": "text"
>>>>>>>>>>>>>> },
>>>>>>>>>>>>>> "tmst": {
>>>>>>>>>>>>>> "type": "date"
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> },
>>>>>>>>>>>>>> "settings": {
>>>>>>>>>>>>>> "index": {
>>>>>>>>>>>>>> "number_of_replicas": "2",
>>>>>>>>>>>>>> "number_of_shards": "5"
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> }'
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *I get below error:*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> *{"error":{"root_cause":[{"type":"mapper_parsing_exception","reason":"Root
>>>>>>>>>>>>>> mapping definition has unsupported parameters: [accuracy :
>>>>>>>>>>>>>> {properties={name={fields={keyword={ignore_above=256,
>>>>>>>>>>>>>> type=keyword}},
>>>>>>>>>>>>>> type=text},
>>>>>>>>>>>>>> tmst={type=date}}}]"}],"type":"mapper_parsing_exception","reason":"Failed
>>>>>>>>>>>>>> to parse mapping [_doc]: Root mapping definition has unsupported
>>>>>>>>>>>>>> parameters: [accuracy :
>>>>>>>>>>>>>> {properties={name={fields={keyword={ignore_above=256,
>>>>>>>>>>>>>> type=keyword}},
>>>>>>>>>>>>>> type=text},
>>>>>>>>>>>>>> tmst={type=date}}}]","caused_by":{"type":"mapper_parsing_exception","reason":"Root
>>>>>>>>>>>>>> mapping definition has unsupported parameters: [accuracy :
>>>>>>>>>>>>>> {properties={name={fields={keyword={ignore_above=256,
>>>>>>>>>>>>>> type=keyword}},
>>>>>>>>>>>>>> type=text}, tmst={type=date}}}]"}},"status":400}*
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Seems like the JSON string is missing some values or is
>>>>>>>>>>>>>> incorrectly provided.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Would be great if you could please help.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>>>> Sunil Muniyal
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Sep 7, 2020 at 8:16 PM Sunil Muniyal <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thank you for the response, William.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have started preparing for ES deployment and should
>>>>>>>>>>>>>>> attempt the same tomorrow.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In the meantime, I will also wait for the Dev team in case
>>>>>>>>>>>>>>> they have any additional inputs.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>>>>> Sunil Muniyal
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Sep 7, 2020 at 8:06 PM William Guo <[email protected]>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If dev confirms it to be mandatory, as I understand
>>>>>>>>>>>>>>>> correct, I will need to:
>>>>>>>>>>>>>>>> 1. Deploy and Configure ES
>>>>>>>>>>>>>>>> 2. Update application.properties to include ES details and
>>>>>>>>>>>>>>>> create ES index
>>>>>>>>>>>>>>>> 3. Rebuild Maven package and rerun the Griffin service
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *Right, you need to package es env configuration into your
>>>>>>>>>>>>>>>> jar.*
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> There is no need to reload the data into Hadoop (Hive),
>>>>>>>>>>>>>>>> correct?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> *No*
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On a side note, is there any other documentation of Griffin
>>>>>>>>>>>>>>>> available or underway which would help to get below details
>>>>>>>>>>>>>>>> while
>>>>>>>>>>>>>>>> integrating it with Cloudera Hadoop?
>>>>>>>>>>>>>>>> 1. What are the exact ports requirements (internal and
>>>>>>>>>>>>>>>> external)?
>>>>>>>>>>>>>>>> *check log and make sure all extra connections in
>>>>>>>>>>>>>>>> properties can accessible*
>>>>>>>>>>>>>>>> 2. Which all packages will be required?
>>>>>>>>>>>>>>>> *no*
>>>>>>>>>>>>>>>> 3. Any Java dependencies?
>>>>>>>>>>>>>>>> *java 1.8*
>>>>>>>>>>>>>>>> 4. If we have Cloudera Hadoop cluster kerberized (secured),
>>>>>>>>>>>>>>>> what are the dependencies or additional configurations needed?
>>>>>>>>>>>>>>>> *Should no extra dependencies, except those transitive
>>>>>>>>>>>>>>>> dependencies incurred by spark and hadoop.*
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Sep 7, 2020 at 6:42 PM Sunil Muniyal <
>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Ohh ok.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> If dev confirms it to be mandatory, as I understand
>>>>>>>>>>>>>>>>> correct, I will need to:
>>>>>>>>>>>>>>>>> 1. Deploy and Configure ES
>>>>>>>>>>>>>>>>> 2. Update application.properties to include ES details and
>>>>>>>>>>>>>>>>> create ES index
>>>>>>>>>>>>>>>>> 3. Rebuild Maven package and rerun the Griffin service
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> There is no need to reload the data into Hadoop (Hive),
>>>>>>>>>>>>>>>>> correct?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On a side note, is there any other documentation of
>>>>>>>>>>>>>>>>> Griffin available or underway which would help to get below
>>>>>>>>>>>>>>>>> details while
>>>>>>>>>>>>>>>>> integrating it with Cloudera Hadoop?
>>>>>>>>>>>>>>>>> 1. What are the exact ports requirements (internal and
>>>>>>>>>>>>>>>>> external)?
>>>>>>>>>>>>>>>>> 2. Which all packages will be required?
>>>>>>>>>>>>>>>>> 3. Any Java dependencies?
>>>>>>>>>>>>>>>>> 4. If we have Cloudera Hadoop cluster kerberized
>>>>>>>>>>>>>>>>> (secured), what are the dependencies or additional
>>>>>>>>>>>>>>>>> configurations needed?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I know some of the above information can be fetched from
>>>>>>>>>>>>>>>>> the deployment guide on Github. However, checking if any
>>>>>>>>>>>>>>>>> other formal
>>>>>>>>>>>>>>>>> documentation has been made available for the same?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>>>>>>> Sunil Muniyal
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Sep 7, 2020 at 4:05 PM William Guo <
>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> cc dev for double checking.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Measure will emit metrics and store them in elastic, UI
>>>>>>>>>>>>>>>>>> fetch those metrics from elastic.
>>>>>>>>>>>>>>>>>> So elastic should be mandatory.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> William
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Mon, Sep 7, 2020 at 6:32 PM Sunil Muniyal <
>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thank you for the quick response, William.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I have not configured ElasticSearch since it is not
>>>>>>>>>>>>>>>>>>> deployed.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> In the application.properties, I just added the dummy
>>>>>>>>>>>>>>>>>>> information (as below) just to pass the validation test and
>>>>>>>>>>>>>>>>>>> get Griffin up
>>>>>>>>>>>>>>>>>>> and running.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> # elasticsearch
>>>>>>>>>>>>>>>>>>> # elasticsearch.host = <IP>
>>>>>>>>>>>>>>>>>>> # elasticsearch.port = <elasticsearch rest port>
>>>>>>>>>>>>>>>>>>> # elasticsearch.user = user
>>>>>>>>>>>>>>>>>>> # elasticsearch.password = password
>>>>>>>>>>>>>>>>>>> elasticsearch.host=localhost
>>>>>>>>>>>>>>>>>>> elasticsearch.port=9200
>>>>>>>>>>>>>>>>>>> elasticsearch.scheme=http
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Is ElasticSearch a mandatory requirement to use Griffin?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>>>>>>>>> Sunil Muniyal
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Mon, Sep 7, 2020 at 3:58 PM William Guo <
>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Could you check whether ES has been injected with those
>>>>>>>>>>>>>>>>>>>> metrics or not?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Mon, Sep 7, 2020 at 6:23 PM Sunil Muniyal <
>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hello William,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I was able to bypass this error by entering the
>>>>>>>>>>>>>>>>>>>>> default field values for LDAP, ElasticSearch and Livy in
>>>>>>>>>>>>>>>>>>>>> application.properties and successfully get Griffin
>>>>>>>>>>>>>>>>>>>>> running.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> By following the below article, I have created a test
>>>>>>>>>>>>>>>>>>>>> measure and then a job which triggers that measure.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/griffin/blob/master/griffin-doc/ui/user-guide.md
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Have allowed the job to get triggered multiple times,
>>>>>>>>>>>>>>>>>>>>> however, still i can't see anything in metrics related to
>>>>>>>>>>>>>>>>>>>>> the job. Neither
>>>>>>>>>>>>>>>>>>>>> I see anything in *health *or *mydashboard* tabs.
>>>>>>>>>>>>>>>>>>>>> Also, if you notice in the screenshot below, being in the
>>>>>>>>>>>>>>>>>>>>> *DQ
>>>>>>>>>>>>>>>>>>>>> Metrics* tab, I still do not see the created measure
>>>>>>>>>>>>>>>>>>>>> in the drop down list.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> [image: image.png]
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> *Test job executed multiple times:*
>>>>>>>>>>>>>>>>>>>>> [image: image.png]
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Please advise if anything is mis-configured.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>>>>>>>>>>> Sunil Muniyal
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Mon, Sep 7, 2020 at 12:40 PM Sunil Muniyal <
>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hello William,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thank you for the reply.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> This helped, actually i had missed to add the
>>>>>>>>>>>>>>>>>>>>>> property in application.properties.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Now the other challenge is, along with ES and Livy, I
>>>>>>>>>>>>>>>>>>>>>> am also not using LDAP and it is hitting the error
>>>>>>>>>>>>>>>>>>>>>> *unable
>>>>>>>>>>>>>>>>>>>>>> to resolve ldap.url property.* Of Course it will,
>>>>>>>>>>>>>>>>>>>>>> since the property is not configured.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Please suggest.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>>>>>>>>>>>> Sunil Muniyal
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Sun, Sep 6, 2020 at 7:26 PM William Guo <
>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> hi Sunil Muniyal,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Could you check this property in your griffin
>>>>>>>>>>>>>>>>>>>>>>> properties file?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> internal.event.listeners
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> William
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Thu, Sep 3, 2020 at 11:05 PM Sunil Muniyal <
>>>>>>>>>>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I am attempting to integrate Griffin with Cloudera
>>>>>>>>>>>>>>>>>>>>>>>> Hadoop by following below article:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/griffin/blob/master/griffin-doc/deploy/deploy-guide.md
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> <https://github.com/apache/griffin/blob/master/griffin-doc/deploy/deploy-guide.md>I
>>>>>>>>>>>>>>>>>>>>>>>> have followed everything as instructed, apart from
>>>>>>>>>>>>>>>>>>>>>>>> below things:
>>>>>>>>>>>>>>>>>>>>>>>> 1. Using Cloudera Hadoop 5.15 and relevant
>>>>>>>>>>>>>>>>>>>>>>>> configurations instead of Apache Hadoop
>>>>>>>>>>>>>>>>>>>>>>>> 2. Not using Elastic search as it is not applicable
>>>>>>>>>>>>>>>>>>>>>>>> 3. Did not use Livy as it is not applicable.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Maven build is successful and has got 2 jars at
>>>>>>>>>>>>>>>>>>>>>>>> service/target and measure/target which I have
>>>>>>>>>>>>>>>>>>>>>>>> uploaded to HDFS.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> However, *starting griffin-service.jar using nohup
>>>>>>>>>>>>>>>>>>>>>>>> command* is failing with below error:
>>>>>>>>>>>>>>>>>>>>>>>> *Caused by: java.lang.IllegalArgumentException:
>>>>>>>>>>>>>>>>>>>>>>>> Could not resolve placeholder
>>>>>>>>>>>>>>>>>>>>>>>> 'internal.event.listeners' in string value
>>>>>>>>>>>>>>>>>>>>>>>> "#{'${internal.event.listeners}'.split(',')}"*
>>>>>>>>>>>>>>>>>>>>>>>> * at
>>>>>>>>>>>>>>>>>>>>>>>> org.springframework.util.PropertyPlaceholderHelper.parseStringValue(PropertyPlaceholderHelper.java:174)
>>>>>>>>>>>>>>>>>>>>>>>> ~[spring-core-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]*
>>>>>>>>>>>>>>>>>>>>>>>> * at
>>>>>>>>>>>>>>>>>>>>>>>> org.springframework.util.PropertyPlaceholderHelper.replacePlaceholders(PropertyPlaceholderHelper.java:126)
>>>>>>>>>>>>>>>>>>>>>>>> ~[spring-core-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]*
>>>>>>>>>>>>>>>>>>>>>>>> * at
>>>>>>>>>>>>>>>>>>>>>>>> org.springframework.core.env.AbstractPropertyResolver.doResolvePlaceholders(AbstractPropertyResolver.java:236)
>>>>>>>>>>>>>>>>>>>>>>>> ~[spring-core-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]*
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I have tried to search a lot of articles with no
>>>>>>>>>>>>>>>>>>>>>>>> luck.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Would be great if someone could help me to fix this.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Also, attached is the output of nohup command that
>>>>>>>>>>>>>>>>>>>>>>>> was written in service.out.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thanks and Regards,
>>>>>>>>>>>>>>>>>>>>>>>> Sunil Muniyal
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> ---------------------------------------------------------------------
>>>>>>>>>>>>>>>>>>>>>>>> To unsubscribe, e-mail:
>>>>>>>>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>>>>> For additional commands, e-mail:
>>>>>>>>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------
>>>>>>>> To unsubscribe, e-mail: [email protected]
>>>>>>>> For additional commands, e-mail: [email protected]
>>>>>>>
>>>>>>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>
>