Hi Lionel,
I tried the below CURL which you sent me
curl -X PUT 'http://<E.S IP>/griffin?pretty=true' -H 'Content-Type:
application/json' -d '{"mappings": {"accuracy": {"properties": {"name" :
{"type": "keyword"},"tmst" : {"type": "long"},"value" : {"properties":
{"total": {"type": "long"},"miss": {"type": "long"},"matched": {"type":
"long"}}}}}}}'
When I try to GET the indexes, I can see that griffin index has been created in
the elastic search. Then I ran the service jar again but I could not see DQ
Metric getting populated.
Am I missing something here?
Thank you,
Karan Gupta
From: Lionel Liu <[email protected]>
Sent: Friday, May 4, 2018 6:29 PM
To: Karan Gupta <[email protected]>
Cc: [email protected]
Subject: Re: No Index Formation in Elastic Search
Hi Karan,
For accuracy, you can try mappings like this:
curl -XPUT 'http://<ES ip address>:9200/griffin?pretty=true' -d '{
"mappings": {
"accuracy": {
"properties": {
"name" : {"type": "keyword"},
"tmst" : {"type": "long"},
"value" : {
"properties": {
"total": {"type": "long"},
"miss": {"type": "long"},
"matched": {"type": "long"}
}
}
}
}
}
}'
The metric schema is like this:
{
"name": "accuracy",
"tmst":1525320600000
"value": {
"total": 100000,
"miss": 200,
"matched": 99800
}
}
For profiling, you may need another mappings.
In our wiki, you can get the metric schema here:
https://cwiki.apache.org/confluence/display/GRIFFIN/4.+Metric+schema<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FGRIFFIN%2F4.%2BMetric%2Bschema&data=01%7C01%7Ckaran.gupta%40tavant.com%7C39a2fe6e368d416be4ed08d5b1bec7cd%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=tNs%2FyRj9XrtC1hM8DZEErEbJV0kAAYcIm7tbuLCKlSg%3D&reserved=0>
As I know, ES doesn't need to create indices manually, it will create the
mappings by the first value posted. That's what we do in our docker image, and
it works.
Thanks,
Lionel
On Fri, May 4, 2018 at 7:33 PM, Karan Gupta
<[email protected]<mailto:[email protected]>> wrote:
Hi Lionel,
We are not using Docker Image, hence we want to set it up manually.
Could you provide us the “CREATE” statement for griffin indices along with
“mappings”.
Thank you,
Karan Gupta
From: Lionel Liu <[email protected]<mailto:[email protected]>>
Sent: Friday, May 4, 2018 2:56 PM
To: Karan Gupta <[email protected]<mailto:[email protected]>>
Cc: [email protected]<mailto:[email protected]>
Subject: Re: No Index Formation in Elastic Search
Hi Karan,
In our docker image, we only configured 'http.cors.enabled: true' and
'http.cors.allow-origin: "*"' in elasticsearch.yml, as the Dockerfile:
https://github.com/bhlx3lyx7/griffin-docker/blob/master/elasticsearch/Dockerfile<https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fbhlx3lyx7%2Fgriffin-docker%2Fblob%2Fmaster%2Felasticsearch%2FDockerfile&data=01%7C01%7Ckaran.gupta%40tavant.com%7C55ed780eb1764316a4fc08d5b1a106e4%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=YC25vaOxqEeCIJBLTiDgla7d2%2FxtJaC37kc%2FhUpsJ6w%3D&reserved=0>
That's all the things we've done for ES configuration, without any other
initialization. And when the spark application post metrics to ES directly, it
succeed.
ES will generate the indices by the first value you post to it.
Thanks,
Lionel
On Fri, May 4, 2018 at 4:46 PM, Karan Gupta
<[email protected]<mailto:[email protected]>> wrote:
HI Lionel,
The metrics is being persisted in HDFS… This is good progress for us. Thank you
for all your valuable help.
We created an index for Griffin but we were not sure about what mappings we
should use.
Until we created this, we never got this index auto-created in ES…..
And now that we have created the index, there are errors which are suggestive
of missing “mappings”
Is there an auto index create property that we need to enable somewhere in ES?
I could not find anything in the config yml file though….
Thank you,
Karan Gupta
From: Lionel Liu <[email protected]<mailto:[email protected]>>
Sent: Friday, May 4, 2018 2:11 PM
To: Karan Gupta <[email protected]<mailto:[email protected]>>
Cc: [email protected]<mailto:[email protected]>
Subject: Re: No Index Formation in Elastic Search
Hi Karan,
For HTTP persistence, are the metrics persisted directly from “Spark”? (or)
Griffin services writes into it?
[Answer] The metrics are persisted directly from spark application.
Our URL is like this: http://localhost:9200/griffin/accuracy (if it is from
Griffin service, it will work…. But from Spark executors, it wont work as
localhost resolves to executor host)
[Answer] I think you can modify "localhost" to the ip address of ES.
But we have not created any index in “ES” called “griffin” or “accuracy”….?
What should we be doing here?
[Answer] You don't need to create the indices in ES, ES will create it when
post metrics to it.
For the "email" and "sms" parameters, they are not enabled in this version, you
can just ignore them in env.json.
BTW, has the metrics been persisted on HDFS?
Thanks,
Lionel
On Fri, May 4, 2018 at 2:24 PM, Karan Gupta
<[email protected]<mailto:[email protected]>> wrote:
Hi,
Thank you for the detail.
In env.json, we have specified both HDFS and HTTP.
For HTTP persistence, are the metrics persisted directly from “Spark”? (or)
Griffin services writes into it?
Our URL is like this: http://localhost:9200/griffin/accuracy (if it is from
Griffin service, it will work…. But from Spark executors, it wont work as
localhost resolves to executor host)
But we have not created any index in “ES” called “griffin” or “accuracy”….?
What should we be doing here?
One more:
Yesterday we found that “email” and “sms” parts of the env.json are not
configured properly.
They appear as “array” in JSON… but the “EmailParam” and “SmsParam” do not
expect a List…
This was causing Spark jobs not to launch.
We edited the env.json accordingly…. We hope we did the right thing…
Can you confirm this?
Thank you,
Karan Gupta
From: Lionel Liu <[email protected]<mailto:[email protected]>>
Sent: Friday, May 4, 2018 11:46 AM
To: Karan Gupta <[email protected]<mailto:[email protected]>>
Cc: [email protected]<mailto:[email protected]>
Subject: Re: No Index Formation in Elastic Search
Hi Karan,
First, we need to check has griffin successfully finished. What persist types
did you configure in env.json? "log", "hdfs", "http"?
- "log": print the metrics in application log.
- "hdfs": the metrics will be persisted in hdfs path you've set.
- "http": post the metrics to the "api" you've set, which should be the
elasticsearch endpoint by default.
You can choose multiple of them.
If "http" is not configured correctly, post metrics to ES fails.
If "hdfs" is configured, but you can not get any metric persisted in the
"path", maybe griffin has not finish the calculation correctly.
If "log" is configured, you can get the application log from yarn:
yarn logs -applicationId <appId> > applog
Then read the applog, find if there's any output metric calculated.
If there's no metric persisted by any type of your persist configuration, you
need to read the applog, and find the error message. Then you can show it to
me, I'll help you find it.
Thanks,
Lionel
On Fri, May 4, 2018 at 2:00 PM, Karan Gupta
<[email protected]<mailto:[email protected]>> wrote:
Hi Lionel,
While the Spark Application gets finished, I do not see any Index getting
created in the elastic search, hence I do not see the data quality metrics
getting populated.
Could you help me out with a possible solution?
Thank you,
Karan Gupta
________________________________
Any comments or statements made in this email are not necessarily those of
Tavant Technologies. The information transmitted is intended only for the
person or entity to which it is addressed and may contain confidential and/or
privileged material. If you have received this in error, please contact the
sender and delete the material from any computer. All emails sent from or to
Tavant Technologies may be subject to our monitoring procedures.