Hi Karan,
Sorry for the missing field "__tmst", which is the timestamp with each
output value record.
The mappings schema should be:
{
"mappings": {
"accuracy": {
"properties": {
"name" : {"type": "keyword"},
"tmst" : {"type": "long"},
"value" : {
"properties": {
"__tmst": {"type": "long"},
"total": {"type": "long"},
"miss": {"type": "long"},
"matched": {"type": "long"}
}
}
}
}
}
}
Thanks,
Lionel
On Wed, May 9, 2018 at 1:56 PM, Karan Gupta <[email protected]> wrote:
> Hi Lionel,
>
> I tried the below CURL which you sent me
>
> curl -X PUT 'http://<E.S IP>/griffin?pretty=true' -H 'Content-Type:
> application/json' -d '{"mappings": {"accuracy": {"properties": {"name" :
> {"type": "keyword"},"tmst" : {"type": "long"},"value" : {"properties":
> {"total": {"type": "long"},"miss": {"type": "long"},"matched": {"type":
> "long"}}}}}}}'
>
> When I try to GET the indexes, I can see that griffin index has been
> created in the elastic search. Then I ran the service jar again but I could
> not see DQ Metric getting populated.
>
> Am I missing something here?
>
> Thank you,
> Karan Gupta
>
>
>
> From: Lionel Liu <[email protected]>
> Sent: Friday, May 4, 2018 6:29 PM
> To: Karan Gupta <[email protected]>
> Cc: [email protected]
> Subject: Re: No Index Formation in Elastic Search
>
> Hi Karan,
>
> For accuracy, you can try mappings like this:
>
> curl -XPUT 'http://<ES ip address>:9200/griffin?pretty=true' -d '{
> "mappings": {
> "accuracy": {
> "properties": {
> "name" : {"type": "keyword"},
> "tmst" : {"type": "long"},
> "value" : {
> "properties": {
> "total": {"type": "long"},
> "miss": {"type": "long"},
> "matched": {"type": "long"}
> }
> }
> }
> }
> }
> }'
>
> The metric schema is like this:
>
> {
>
> "name": "accuracy",
>
> "tmst":1525320600000
>
> "value": {
>
> "total": 100000,
>
> "miss": 200,
>
> "matched": 99800
>
> }
>
> }
>
>
>
> For profiling, you may need another mappings.
>
> In our wiki, you can get the metric schema here:
> https://cwiki.apache.org/confluence/display/GRIFFIN/4.+Metric+schema<
> https://apac01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.
> apache.org%2Fconfluence%2Fdisplay%2FGRIFFIN%2F4.%
> 2BMetric%2Bschema&data=01%7C01%7Ckaran.gupta%40tavant.com%
> 7C39a2fe6e368d416be4ed08d5b1bec7cd%7Cc6c1e9da5d0c4f8f9a023c67206e
> fbd6%7C0&sdata=tNs%2FyRj9XrtC1hM8DZEErEbJV0kAAYcIm7tbuLCKlSg%3D&reserved=0
> >
>
> As I know, ES doesn't need to create indices manually, it will create the
> mappings by the first value posted. That's what we do in our docker image,
> and it works.
>
>
> Thanks,
> Lionel
>
>
> On Fri, May 4, 2018 at 7:33 PM, Karan Gupta <[email protected]<
> mailto:[email protected]>> wrote:
> Hi Lionel,
>
> We are not using Docker Image, hence we want to set it up manually.
> Could you provide us the “CREATE” statement for griffin indices along with
> “mappings”.
>
>
> Thank you,
> Karan Gupta
>
> From: Lionel Liu <[email protected]<mailto:[email protected]>>
> Sent: Friday, May 4, 2018 2:56 PM
>
> To: Karan Gupta <[email protected]<mailto:[email protected]>>
> Cc: [email protected]<mailto:dev@griffin.
> incubator.apache.org>
> Subject: Re: No Index Formation in Elastic Search
>
> Hi Karan,
>
> In our docker image, we only configured 'http.cors.enabled: true' and
> 'http.cors.allow-origin: "*"' in elasticsearch.yml, as the Dockerfile:
> https://github.com/bhlx3lyx7/griffin-docker/blob/master/
> elasticsearch/Dockerfile<https://apac01.safelinks.
> protection.outlook.com/?url=https%3A%2F%2Fgithub.com%
> 2Fbhlx3lyx7%2Fgriffin-docker%2Fblob%2Fmaster%2Felasticsearch%2FDockerfile&
> data=01%7C01%7Ckaran.gupta%40tavant.com%7C55ed780eb1764316a4fc08d5b1a1
> 06e4%7Cc6c1e9da5d0c4f8f9a023c67206efbd6%7C0&sdata=
> YC25vaOxqEeCIJBLTiDgla7d2%2FxtJaC37kc%2FhUpsJ6w%3D&reserved=0>
> That's all the things we've done for ES configuration, without any other
> initialization. And when the spark application post metrics to ES directly,
> it succeed.
>
> ES will generate the indices by the first value you post to it.
>
> Thanks,
> Lionel
>
> On Fri, May 4, 2018 at 4:46 PM, Karan Gupta <[email protected]<
> mailto:[email protected]>> wrote:
> HI Lionel,
>
> The metrics is being persisted in HDFS… This is good progress for us.
> Thank you for all your valuable help.
>
> We created an index for Griffin but we were not sure about what mappings
> we should use.
> Until we created this, we never got this index auto-created in ES…..
> And now that we have created the index, there are errors which are
> suggestive of missing “mappings”
>
> Is there an auto index create property that we need to enable somewhere in
> ES?
> I could not find anything in the config yml file though….
>
> Thank you,
> Karan Gupta
> From: Lionel Liu <[email protected]<mailto:[email protected]>>
> Sent: Friday, May 4, 2018 2:11 PM
>
> To: Karan Gupta <[email protected]<mailto:[email protected]>>
> Cc: [email protected]<mailto:dev@griffin.
> incubator.apache.org>
> Subject: Re: No Index Formation in Elastic Search
>
> Hi Karan,
>
> For HTTP persistence, are the metrics persisted directly from “Spark”?
> (or) Griffin services writes into it?
> [Answer] The metrics are persisted directly from spark application.
>
> Our URL is like this: http://localhost:9200/griffin/accuracy (if it is
> from Griffin service, it will work…. But from Spark executors, it wont work
> as localhost resolves to executor host)
> [Answer] I think you can modify "localhost" to the ip address of ES.
>
> But we have not created any index in “ES” called “griffin” or
> “accuracy”….? What should we be doing here?
> [Answer] You don't need to create the indices in ES, ES will create it
> when post metrics to it.
>
> For the "email" and "sms" parameters, they are not enabled in this
> version, you can just ignore them in env.json.
>
> BTW, has the metrics been persisted on HDFS?
>
> Thanks,
> Lionel
>
>
>
> On Fri, May 4, 2018 at 2:24 PM, Karan Gupta <[email protected]<
> mailto:[email protected]>> wrote:
> Hi,
>
> Thank you for the detail.
>
> In env.json, we have specified both HDFS and HTTP.
> For HTTP persistence, are the metrics persisted directly from “Spark”?
> (or) Griffin services writes into it?
> Our URL is like this: http://localhost:9200/griffin/accuracy (if it is
> from Griffin service, it will work…. But from Spark executors, it wont work
> as localhost resolves to executor host)
> But we have not created any index in “ES” called “griffin” or
> “accuracy”….? What should we be doing here?
>
> One more:
>
> Yesterday we found that “email” and “sms” parts of the env.json are not
> configured properly.
> They appear as “array” in JSON… but the “EmailParam” and “SmsParam” do not
> expect a List…
> This was causing Spark jobs not to launch.
> We edited the env.json accordingly…. We hope we did the right thing…
> Can you confirm this?
>
> Thank you,
> Karan Gupta
>
> From: Lionel Liu <[email protected]<mailto:[email protected]>>
> Sent: Friday, May 4, 2018 11:46 AM
> To: Karan Gupta <[email protected]<mailto:[email protected]>>
> Cc: [email protected]<mailto:dev@griffin.
> incubator.apache.org>
> Subject: Re: No Index Formation in Elastic Search
>
> Hi Karan,
>
> First, we need to check has griffin successfully finished. What persist
> types did you configure in env.json? "log", "hdfs", "http"?
> - "log": print the metrics in application log.
> - "hdfs": the metrics will be persisted in hdfs path you've set.
> - "http": post the metrics to the "api" you've set, which should be the
> elasticsearch endpoint by default.
>
> You can choose multiple of them.
> If "http" is not configured correctly, post metrics to ES fails.
> If "hdfs" is configured, but you can not get any metric persisted in the
> "path", maybe griffin has not finish the calculation correctly.
> If "log" is configured, you can get the application log from yarn:
> yarn logs -applicationId <appId> > applog
> Then read the applog, find if there's any output metric calculated.
> If there's no metric persisted by any type of your persist configuration,
> you need to read the applog, and find the error message. Then you can show
> it to me, I'll help you find it.
>
> Thanks,
> Lionel
>
>
> On Fri, May 4, 2018 at 2:00 PM, Karan Gupta <[email protected]<
> mailto:[email protected]>> wrote:
> Hi Lionel,
>
> While the Spark Application gets finished, I do not see any Index getting
> created in the elastic search, hence I do not see the data quality metrics
> getting populated.
> Could you help me out with a possible solution?
>
>
> Thank you,
> Karan Gupta
> ________________________________
> Any comments or statements made in this email are not necessarily those of
> Tavant Technologies. The information transmitted is intended only for the
> person or entity to which it is addressed and may contain confidential
> and/or privileged material. If you have received this in error, please
> contact the sender and delete the material from any computer. All emails
> sent from or to Tavant Technologies may be subject to our monitoring
> procedures.
>
>
>
>
>