update documents update api documents and README.md
Author: He Wang <[email protected]> Closes #221 from whhe/master. Project: http://git-wip-us.apache.org/repos/asf/incubator-griffin/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-griffin/commit/b2f334a3 Tree: http://git-wip-us.apache.org/repos/asf/incubator-griffin/tree/b2f334a3 Diff: http://git-wip-us.apache.org/repos/asf/incubator-griffin/diff/b2f334a3 Branch: refs/heads/master Commit: b2f334a357c0847a494a05fd74fe8009ca4f41db Parents: 711acce Author: He Wang <[email protected]> Authored: Mon Feb 12 10:55:42 2018 +0800 Committer: Lionel Liu <[email protected]> Committed: Mon Feb 12 10:55:42 2018 +0800 ---------------------------------------------------------------------- README.md | 225 +- griffin-doc/service/api-guide.md | 858 ++++--- griffin-doc/service/postman/griffin.json | 2333 ++++++++++-------- .../griffin/core/metric/MetricStoreImpl.java | 6 +- 4 files changed, 1983 insertions(+), 1439 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/b2f334a3/README.md ---------------------------------------------------------------------- diff --git a/README.md b/README.md index 13a52eb..780e3ac 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,5 @@ + + <!-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file @@ -18,121 +20,168 @@ under the License. --> -## Apache Griffin +# Apache Griffin [](https://travis-ci.org/apache/incubator-griffin) [](https://www.apache.org/licenses/LICENSE-2.0.html) -Apache Griffin is a model driven data quality solution for modern data systems. -It provides a standard process to define data quality measures, execute, report, as well as an unified dashboard across multiple data systems. -You can access our home page [here](http://griffin.incubator.apache.org/). -You can access our wiki page [here](https://cwiki.apache.org/confluence/display/GRIFFIN/Apache+Griffin). -You can access our issues jira page [here](https://issues.apache.org/jira/secure/Dashboard.jspa?selectPageId=12330914). +Apache Griffin is a model driven data quality solution for modern data systems. It provides a standard process to define data quality measures, execute, report, as well as an unified dashboard across multiple data systems. -### Contact us -Email: <a href="mailto:[email protected]">[email protected]</a> +## Getting Started -### How to run in docker -1. Install [docker](https://docs.docker.com/engine/installation/) and [docker compose](https://docs.docker.com/compose/install/). -2. Pull our pre-built docker image and elasticsearch image. - ``` - docker pull bhlx3lyx7/svc_msr:0.1.6 - docker pull bhlx3lyx7/elasticsearch - ``` - You can pull the images faster through mirror acceleration if you are in China. - ``` - docker pull registry.docker-cn.com/bhlx3lyx7/svc_msr:0.1.6 - docker pull registry.docker-cn.com/bhlx3lyx7/elasticsearch - ``` -3. Increase vm.max_map_count of your local machine, to use elasticsearch. - ``` - sysctl -w vm.max_map_count=262144 - ``` -4. Copy [docker-compose-batch.yml](https://github.com/apache/incubator-griffin/blob/master/griffin-doc/docker/svc_msr/docker-compose-batch.yml) to your work path. -5. In your work path, start docker containers by using docker compose, wait for about one minutes, then griffin service is ready. - ``` - docker-compose -f docker-compose-batch.yml up -d - ``` -6. Now you can try griffin APIs by using postman after importing the [json files](https://github.com/apache/incubator-griffin/tree/master/griffin-doc/service/postman). - In which you need to modify the environment `BASE_PATH` value into `<your local IP address>:38080`. - -More details about griffin docker [here](https://github.com/apache/incubator-griffin/blob/master/griffin-doc/docker/griffin-docker-guide.md). - -### How to deploy and run at local -1. Install jdk (1.8 or later versions). -2. Install mysql. -3. Install npm (version 6.0.0+). -4. Install [Hadoop](http://apache.claz.org/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz) (2.6.0 or later), you can get some help [here](https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SingleCluster.html). -5. Install [Spark](http://spark.apache.org/downloads.html) (version 1.6.x, griffin does not support 2.0.x at current), if you want to install Pseudo Distributed/Single Node Cluster, you can get some help [here](http://why-not-learn-something.blogspot.com/2015/06/spark-installation-pseudo.html). -6. Install [Hive](http://apache.claz.org/hive/hive-1.2.1/apache-hive-1.2.1-bin.tar.gz) (version 1.2.1 or later), you can get some help [here](https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-RunningHive). + +You can try Griffin in docker following the [docker guide](https://github.com/apache/incubator-griffin/blob/master/griffin-doc/docker/griffin-docker-guide.md). + +To run Griffin at local, you can follow instructions below. + +### Prerequisites +You need to install following items +- jdk (1.8 or later versions). +- mysql. +- npm (version 6.0.0+). +- [Hadoop](http://apache.claz.org/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz) (2.6.0 or later), you can get some help [here](https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SingleCluster.html). +- [Spark](http://spark.apache.org/downloads.html) (version 1.6.x, griffin does not support 2.0.x at current), if you want to install Pseudo Distributed/Single Node Cluster, you can get some help [here](http://why-not-learn-something.blogspot.com/2015/06/spark-installation-pseudo.html). +- [Hive](http://apache.claz.org/hive/hive-1.2.1/apache-hive-1.2.1-bin.tar.gz) (version 1.2.1 or later), you can get some help [here](https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-RunningHive). You need to make sure that your spark cluster could access your HiveContext. -7. Install [Livy](http://archive.cloudera.com/beta/livy/livy-server-0.3.0.zip), you can get some help [here](http://livy.io/quickstart.html). +- [Livy](http://archive.cloudera.com/beta/livy/livy-server-0.3.0.zip), you can get some help [here](http://livy.io/quickstart.html). Griffin need to schedule spark jobs by server, we use livy to submit our jobs. - For some issues of Livy for HiveContext, we need to download 3 files, and put them into Hdfs. + For some issues of Livy for HiveContext, we need to download 3 files, and put them into HDFS. ``` datanucleus-api-jdo-3.2.6.jar datanucleus-core-3.2.10.jar datanucleus-rdbms-3.2.9.jar ``` -8. Install [ElasticSearch](). - ElasticSearch works as a metrics collector, Griffin produces metrics to it, and our default UI get metrics from it, you can use your own way as well. -9. Modify configuration for your environment. - You need to modify the configuration part of code, to make Griffin works well in you environment. - service/src/main/resources/application.properties +- ElasticSearch. + ElasticSearch works as a metrics collector, Griffin produces metrics to it, and our default UI get metrics from it, you can use your own way as well. + +### Configuration + +Create a griffin working directory in HDFS +``` +hdfs dfs -mkdir -p <griffin working dir> +``` +Init quartz tables in mysql by service/src/main/resources/Init_quartz.sql +``` +mysql -u username -p quartz < service/src/main/resources/Init_quartz.sql +``` + + +You should also modify some configurations of Griffin for your environment. + +- <b>service/src/main/resources/application.properties</b> + ``` + # mysql spring.datasource.url = jdbc:mysql://<your IP>:3306/quartz?autoReconnect=true&useSSL=false spring.datasource.username = <user name> spring.datasource.password = <password> - + + # hive hive.metastore.uris = thrift://<your IP>:9083 hive.metastore.dbname = <hive database name> # default is "default" + + # external properties directory location, ignore it if not required + external.config.location = + + # login strategy, default is "default" + login.strategy = <default or ldap> + + # ldap properties, ignore them if ldap is not enabled + ldap.url = ldap://hostname:port + ldap.email = @example.com + ldap.searchBase = DC=org,DC=example + ldap.searchPattern = (sAMAccountName={0}) + + # hdfs, ignore it if you do not need predicate job + fs.defaultFS = hdfs://<hdfs-default-name> + + # elasticsearch + elasticsearch.host = <your IP> + elasticsearch.port = <your elasticsearch rest port> + # authentication properties, uncomment if basic authentication is enabled + # elasticsearch.user = user + # elasticsearch.password = password ``` - service/src/main/resources/sparkJob.properties +- <b>service/src/main/resources/sparkJob.properties</b> ``` sparkJob.file = hdfs://<griffin measure path>/griffin-measure.jar sparkJob.args_1 = hdfs://<griffin env path>/env.json - sparkJob.jars_1 = hdfs://<datanucleus path>/datanucleus-api-jdo-3.2.6.jar - sparkJob.jars_2 = hdfs://<datanucleus path>/datanucleus-core-3.2.10.jar - sparkJob.jars_3 = hdfs://<datanucleus path>/datanucleus-rdbms-3.2.9.jar - sparkJob.uri = http://<your IP>:8998/batches - ``` - ui/js/services/services.js - ``` - ES_SERVER = "http://<your IP>:9200" - ``` - Configure measure/measure-batch/src/main/resources/env.json for your environment, and put it into Hdfs <griffin env path>/ -10. Build the whole project and deploy.(NPM should be installed , on mac you can try 'brew install node') - ``` - mvn install - ``` - Create a directory in Hdfs, and put our measure package into it. - ``` - cp /measure/target/measure-0.1.3-incubating-SNAPSHOT.jar /measure/target/griffin-measure.jar - hdfs dfs -put /measure/target/griffin-measure.jar <griffin measure path>/ - ``` - After all our environment services startup, we can start our server. - ``` - java -jar service/target/service.jar - ``` - After a few seconds, we can visit our default UI of Griffin (by default the port of spring boot is 8080). - ``` - http://<your IP>:8080 - ``` -11. Follow the steps using UI [here](https://github.com/apache/incubator-griffin/blob/master/griffin-doc/ui/dockerUIguide.md#webui-test-case-guide). - + + sparkJob.jars = hdfs://<datanucleus path>/spark-avro_2.11-2.0.1.jar\ + hdfs://<datanucleus path>/datanucleus-api-jdo-3.2.6.jar\ + hdfs://<datanucleus path>/datanucleus-core-3.2.10.jar\ + hdfs://<datanucleus path>/datanucleus-rdbms-3.2.9.jar + + spark.yarn.dist.files = hdfs:///<spark conf path>/hive-site.xml + + livy.uri = http://<your IP>:8998/batches + spark.uri = http://<your IP>:8088 + ``` + You should put these files into the same path as you set above in HDFS + +- <b>measure/src/main/resources/env.json</b> + ``` + "persist": [ + ... + { + "type": "http", + "config": { + "method": "post", + "api": "http://<your ES IP>:<port>/griffin/accuracy" + } + } + ] + ``` + Put this env.json file of measure module into \<griffin env path> in HDFS. + +### Build and Run + +Build the whole project and deploy. (NPM should be installed) + + ``` + mvn install + ``` + +Put jar file of measure module into \<griffin measure path> in HDFS +``` +cp measure/target/measure-<version>-incubating-SNAPSHOT.jar /measure/target/griffin-measure.jar +hdfs dfs -put /measure/target/griffin-measure.jar <griffin measure path>/ + ``` + +After all environment services startup, we can start our server. + + ``` + java -jar service/target/service.jar + ``` + +After a few seconds, we can visit our default UI of Griffin (by default the port of spring boot is 8080). + + ``` + http://<your IP>:8080 + ``` + +You can use UI following the steps [here](https://github.com/apache/incubator-griffin/blob/master/griffin-doc/ui/user-guide.md). **Note**: The front-end UI is still under development, you can only access some basic features currently. -### Document List +## Community -- [Wiki](https://cwiki.apache.org/confluence/display/GRIFFIN/Apache+Griffin) -- [Measure](https://github.com/apache/incubator-griffin/tree/master/griffin-doc/measure) -- [Service](https://github.com/apache/incubator-griffin/tree/master/griffin-doc/service) -- [UI](https://github.com/apache/incubator-griffin/tree/master/griffin-doc/ui) -- [Docker usage](https://github.com/apache/incubator-griffin/tree/master/griffin-doc/docker) -- [Postman API](https://github.com/apache/incubator-griffin/tree/master/griffin-doc/service/postman) -### Contributing +You can contact us via email: <a href="mailto:[email protected]">[email protected]</a> + +You can also subscribe this mail by sending a email to [here](mailto:[email protected]). -See [CONTRIBUTING.md](CONTRIBUTING.md) for details on how to contribute code, documentation, etc. +You can access our issues jira page [here](https://issues.apache.org/jira/browse/GRIFFIN) +## Contributing +See [Contributing Guide](./CONTRIBUTING.md) for details on how to contribute code, documentation, etc. + +## References +- [Home Page](http://griffin.incubator.apache.org/) +- [Wiki](https://cwiki.apache.org/confluence/display/GRIFFIN/Apache+Griffin) +- Documents: + - [Measure](https://github.com/apache/incubator-griffin/tree/master/griffin-doc/measure) + - [Service](https://github.com/apache/incubator-griffin/tree/master/griffin-doc/service) + - [UI](https://github.com/apache/incubator-griffin/tree/master/griffin-doc/ui) + - [Docker usage](https://github.com/apache/incubator-griffin/tree/master/griffin-doc/docker) + - [Postman API](https://github.com/apache/incubator-griffin/tree/master/griffin-doc/service/postman) http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/b2f334a3/griffin-doc/service/api-guide.md ---------------------------------------------------------------------- diff --git a/griffin-doc/service/api-guide.md b/griffin-doc/service/api-guide.md index 9668145..773465b 100644 --- a/griffin-doc/service/api-guide.md +++ b/griffin-doc/service/api-guide.md @@ -23,6 +23,8 @@ This page lists the major RESTful APIs provided by Griffin. Apache Griffin default `BASE_PATH` is `http://<your ip>:8080`. +- [HTTP Response Design](#0) + - [Griffin Basic](#1) - [Measures](#2) @@ -36,6 +38,55 @@ Apache Griffin default `BASE_PATH` is `http://<your ip>:8080`. - [Auth](#6) +<h2 id = "0"></h2> +## HTTP Response Desigin +### Normal Response +The normal HTTP response is designed as follow: + +| Action | HTTP Status | Response Body | +| ---- | ------------------ | ------ | +| POST | 201, "Created" | created item | +| GET | 200, "OK" | requested items | +| PUT | 204, "No Content" | no content | +| DELETE | 204, "No Content" | no content | + +Note that metric module is implemented with elasticsearch bulk api, so the responses do not follow rules above. + +### Exception Response +The response for exception is designed as follow : + +``` +{ + "timestamp": 1517208444322, + "status": 400, + "error": "Bad Request", + "code": 40009, + "message": "Property 'measure.id' is invalid", + "path": "/api/v1/jobs" +} +``` +``` +{ + "timestamp": 1517209428969, + "status": 500, + "error": "Internal Server Error", + "message": "Failed to add metric values", + "exception": "java.net.ConnectException", + "path": "/api/v1/metrics/values" +} +``` +Description: + +- timestamp: the timestamp of response created +- status : the HTTP status code +- error : reason phrase of the HTTP status +- code: customized error code +- message : customized error message +- exception: fully qualified name of cause exception +- path: the requested api + +Note that 'exception' field may not exist if it is caused by client error, and 'code' field may not exist for server error. + <h2 id = "1"></h2> ## Griffin Basic @@ -61,44 +112,89 @@ Apache Griffin default `BASE_PATH` is `http://<your ip>:8080`. | ------- | -------------- | ------- | | measure | measure entity | Measure | -There are two different measures that are griffin measure and external measure. -If you want to create an external measure,you can use following example json in request body. -``` -{ - "type": "external", - "name": "external_name", - "description": " test measure", - "organization": "orgName", - "owner": "test", - "metricName": "metricName" -} -``` -Here gives a griffin measure example in request body and response body. #### Request Body example + +There are two kind of different measures, griffin measure and external measure. And for each type of measure, the 'dq.type' can be 'accuracy' or 'profiling'. + +Here is a request body example to create a griffin measure of profiling: ``` { - "name":"measure_name", - "type":"griffin", - "description":"create a measure", + "name":"profiling_measure", + "measure.type":"griffin", + "dq.type":"profiling", + "rule.description":{ + "details":[ + { + "name":"age", + "infos":"Total Count,Average" + } + ] + }, + "process.type":"batch", + "owner":"test", + "description":"measure description", + "data.sources":[ + { + "name":"source", + "connectors":[ + { + "name":"connector_name", + "type":"hive", + "version":"1.2", + "data.unit":"1hour", + "data.time.zone":"UTC(WET,GMT)", + "config":{ + "database":"default", + "table.name":"demo_src", + "where":"dt=#YYYYMMdd# AND hour=#HH#" + }, + "predicates":[ + { + "type":"file.exist", + "config":{ + "root.path":"hdfs:///griffin/demo_src", + "path":"/dt=#YYYYMMdd#/hour=#HH#/_DONE" + } + } + ] + } + ] + } + ], "evaluate.rule":{ "rules":[ { - "rule":"source.desc=target.desc", "dsl.type":"griffin-dsl", - "dq.type":"accuracy", - "details":{} + "dq.type":"profiling", + "rule":"count(source.`age`) AS `age-count`,avg(source.`age`) AS `age-average`", + "name":"profiling", + "details":{ + + } } ] - }, + } +} +``` +And for griffin measure of accuracy: +``` +{ + "name":"accuracy_measure", + "measure.type":"griffin", + "dq.type":"accuracy", + "process.type":"batch", + "owner":"test", + "description":"measure description", "data.sources":[ { "name":"source", "connectors":[ { - "name":"connector_name_source", + "name":"connector_name_source", "type":"HIVE", "version":"1.2", - "data.unit":"1h", + "data.unit":"1hour", + "data.time.zone":"UTC(WET,GMT)", "config":{ "database":"default", "table.name":"demo_src", @@ -120,13 +216,14 @@ Here gives a griffin measure example in request body and response body. "name":"target", "connectors":[ { - "name":"connector_name_target", + "name":"connector_name_target", "type":"HIVE", "version":"1.2", - "data.unit":"1h", + "data.unit":"1hour", + "data.time.zone":"UTC(WET,GMT)", "config":{ "database":"default", - "table.name":"demo_src", + "table.name":"demo_tgt", "where":"dt=#YYYYMMdd# AND hour=#HH#" }, "predicates":[ @@ -141,26 +238,119 @@ Here gives a griffin measure example in request body and response body. } ] } - ] + ], + "evaluate.rule":{ + "rules":[ + { + "dsl.type":"griffin-dsl", + "dq.type":"accuracy", + "name":"accuracy", + "rule":"source.desc=target.desc" + } + ] + } } ``` -#### Response Body Sample +Example of request body to create external measure: ``` { - "code": 201, - "description": "Create Measure Succeed" + "name": "external_name", + "measure.type": "external", + "dq.type": "accuracy", + "description": "measure description", + "organization": "orgName", + "owner": "test", + "metricName": "metricName" } ``` -It may return failed messages.Such as, +#### Response Body Sample + +The response body should be the created measure if success. For example: ``` { - "code": 410, - "description": "Create Measure Failed, duplicate records" + "measure.type": "griffin", + "id": 1, + "name": "measureName", + "description": "measure description", + "organization": "orgName", + "owner": "test", + "deleted": false, + "dq.type": "accuracy", + "process.type": "batch", + "data.sources": [ + { + "id": 1, + "name": "source", + "connectors": [ + { + "id": 1, + "name": "connector_name_source", + "type": "HIVE", + "version": "1.2", + "predicates": [ + { + "id": 1, + "type": "file.exist", + "config": { + "root.path": "hdfs:///griffin/demo_src", + "path": "/dt=#YYYYMMdd#/hour=#HH#/_DONE" + } + } + ], + "data.unit": "1h", + "config": { + "database": "default", + "table.name": "demo_src", + "where": "dt=#YYYYMMdd# AND hour=#HH#" + } + } + ] + }, + { + "id": 2, + "name": "target", + "connectors": [ + { + "id": 2, + "name": "connector_name_target", + "type": "HIVE", + "version": "1.2", + "predicates": [ + { + "id": 2, + "type": "file.exist", + "config": { + "root.path": "hdfs:///griffin/demo_src", + "path": "/dt=#YYYYMMdd#/hour=#HH#/_DONE" + } + } + ], + "data.unit": "1h", + "config": { + "database": "default", + "table.name": "demo_src", + "where": "dt=#YYYYMMdd# AND hour=#HH#" + } + } + ] + } + ], + "evaluate.rule": { + "id": 1, + "rules": [ + { + "id": 1, + "rule": "source.desc=target.desc", + "name": "rule_name", + "description": "Total count", + "dsl.type": "griffin-dsl", + "dq.type": "accuracy", + "details": {} + } + ] + } } - ``` -The reason for failure may be that connector names already exist or connector names are empty. - ### Get measures `GET /api/v1/measures` @@ -168,55 +358,69 @@ The reason for failure may be that connector names already exist or connector na ``` [ { - "id": 1, - "name": "measurename", - "description": "This is measure test.", + "measure.type": "griffin", + "id": 4, + "name": "measure_no_predicate_day", "owner": "test", + "description": null, + "organization": null, "deleted": false, + "dq.type": "accuracy", "process.type": "batch", - "evaluateRule": { - "id": 1, - "rules": [ - { - "id": 1, - "rule": "source.id=target.id AND source.age=target.age", - "dsl.type": "griffin-dsl", - "dq.type": "accuracy" - } - ] - }, "data.sources": [ { - "id": 1, + "id": 6, "name": "source", "connectors": [ { - "id": 1, + "id": 6, + "name": "source1517994133405", "type": "HIVE", "version": "1.2", + "predicates": [], + "data.unit": "1day", + "data.time.zone": "UTC(WET,GMT)", "config": { "database": "default", - "table.name": "demo_src" + "table.name": "demo_src", + "where": "dt=#YYYYMMdd# AND hour=#HH#" } } ] }, { - "id": 2, + "id": 7, "name": "target", "connectors": [ { - "id": 2, + "id": 7, + "name": "target1517994142573", "type": "HIVE", "version": "1.2", + "predicates": [], + "data.unit": "1day", + "data.time.zone": "UTC(WET,GMT)", "config": { "database": "default", - "table.name": "demo_tgt" + "table.name": "demo_tgt", + "where": "dt=#YYYYMMdd# AND hour=#HH#" } } ] } - ] + ], + "evaluate.rule": { + "id": 4, + "rules": [ + { + "id": 4, + "rule": "source.age=target.age AND source.desc=target.desc", + "name": "accuracy", + "dsl.type": "griffin-dsl", + "dq.type": "accuracy" + } + ] + } } ] ``` @@ -233,101 +437,111 @@ The reason for failure may be that connector names already exist or connector na | name | description | type | | ------- | -------------- | ------- | | measure | measure entity | Measure | -There are two different measures that are griffin measure and external measure. -If you want to update an external measure,you can use following example json in request body. + +#### Request Body example +There are two kind of different measures, griffin measure and external measure. And for each type of measure, the 'dq.type' can be 'accuracy' or 'profiling'. + +Here is a request body example to update a griffin measure of accuracy: ``` { - "id":1, - "type": "external", - "name": "external_name", - "description": " update test measure", + "id": 1, + "name": "measureName_edit", + "description": "measure description", "organization": "orgName", "owner": "test", - "metricName": "metricName" -} -``` -Here gives a griffin measure example in request body and response body. -#### Request Body example -``` -{ - "id": 1, - "name": "measure_official_update", - "description": "create a measure", - "owner": "test", - "deleted": false, - "type": "griffin", - "process.type": "batch", - "data.sources": [ - { - "id": 1, - "name": "source", - "connectors": [ - { - "id": 1, - "name": "connector_name_source", - "type": "HIVE", - "version": "1.2", - "predicates": [], - "data.unit": "1h", - "config": { - "database": "default", - "table.name": "demo_src", - "where": "dt=#YYYYMMdd# AND hour=#HH#" + "deleted": false, + "dq.type": "accuracy", + "process.type": "batch", + "data.sources": [ + { + "id": 1, + "name": "source", + "connectors": [ + { + "id": 1, + "name": "connector_name_source", + "type": "HIVE", + "version": "1.2", + "predicates": [ + { + "id": 1, + "type": "file.exist", + "config": { + "root.path": "hdfs:///griffin/demo_src", + "path": "/dt=#YYYYMMdd#/hour=#HH#/_DONE" + } } + ], + "data.unit": "1h", + "config": { + "database": "default", + "table.name": "demo_src", + "where": "dt=#YYYYMMdd# AND hour=#HH#" } - ] - }, - { - "id": 2, - "name": "target", - "connectors": [ - { - "id": 2, - "name": "connector_name_target", - "type": "HIVE", - "version": "1.2", - "predicates": [], - "data.unit": "1h", - "config": { - "database": "default", - "table.name": "demo_src", - "where": "dt=#YYYYMMdd# AND hour=#HH#" + } + ] + }, + { + "id": 2, + "name": "target", + "connectors": [ + { + "id": 2, + "name": "connector_name_target", + "type": "HIVE", + "version": "1.2", + "predicates": [ + { + "id": 2, + "type": "file.exist", + "config": { + "root.path": "hdfs:///griffin/demo_src", + "path": "/dt=#YYYYMMdd#/hour=#HH#/_DONE" + } } + ], + "data.unit": "1h", + "config": { + "database": "default", + "table.name": "demo_src", + "where": "dt=#YYYYMMdd# AND hour=#HH#" } - ] - } - ], - "evaluate.rule": { - "id": 1, - "rules": [ - { - "id": 1, - "rule": "source.desc=target.desc", - "dsl.type": "griffin-dsl", - "dq.type": "accuracy", - "details": {} } ] } - } -``` -#### Response Body Sample -``` -{ - "code": 204, - "description": "Update Measure Succeed" + ], + "evaluate.rule": { + "id": 1, + "rules": [ + { + "id": 1, + "rule": "source.desc=target.desc", + "name": "rule_name", + "description": "Total count", + "dsl.type": "griffin-dsl", + "dq.type": "accuracy", + "details": {} + } + ] + }, + "measure.type": "griffin" } ``` -It may return failed messages.Such as, +If you want to update an external measure, you can use following example json in request body. ``` { - "code": 400, - "description": "Resource Not Found" + "id":1, + "measure.type": "external", + "dq.type": "accuracy", + "name": "external_name", + "description": " update test measure", + "organization": "orgName", + "owner": "test", + "metricName": "metricName" } - ``` - -The reason for failure may be that measure id doesn't exist. +#### Response Body Sample +The response body should be empty if no error happens, and the HTTP status is (204, "No Content"). ### Delete measure `DELETE /api/v1/measures/{id}` @@ -340,25 +554,8 @@ When deleting a measure,api will also delete related jobs. `/api/v1/measures/1` #### Response Body Sample -``` -{ - "code": 202, - "description": "Delete Measures By Id Succeed" -} -``` - -It may return failed messages.Such as, - -``` -{ - "code": 400, - "description": "Resource Not Found" -} - -``` - -The reason for failure may be that measure id doesn't exist. +The response body should be empty if no error happens, and the HTTP status is (204, "No Content"). ### Get measure by id `GET /api/v1/measures/{id}` @@ -372,59 +569,71 @@ The reason for failure may be that measure id doesn't exist. #### Response Body Sample ``` { - "id": 1, - "name": "measureName", - "description": "This is a test measure", - "organization": "orgName", - "evaluateRule": { - "id": 1, - "rules": [ - { - "id": 1, - "rule": "source.id = target.id and source.age = target.age and source.desc = target.desc", - "dsl.type": "griffin-dsl", - "dq.type": "accuracy" - } - ] - }, + "measure.type": "griffin", + "id": 4, + "name": "measure_no_predicate_day", "owner": "test", + "description": null, + "organization": null, "deleted": false, + "dq.type": "accuracy", "process.type": "batch", "data.sources": [ { - "id": 39, + "id": 6, "name": "source", "connectors": [ { - "id": 1, + "id": 6, + "name": "source1517994133405", "type": "HIVE", "version": "1.2", + "predicates": [], + "data.unit": "1day", + "data.time.zone": "UTC(WET,GMT)", "config": { "database": "default", - "table.name": "demo_src" + "table.name": "demo_src", + "where": "dt=#YYYYMMdd# AND hour=#HH#" } } ] }, { - "id": 2, + "id": 7, "name": "target", "connectors": [ { - "id": 2, + "id": 7, + "name": "target1517994142573", "type": "HIVE", "version": "1.2", + "predicates": [], + "data.unit": "1day", + "data.time.zone": "UTC(WET,GMT)", "config": { "database": "default", - "table.name": "demo_tgt" + "table.name": "demo_tgt", + "where": "dt=#YYYYMMdd# AND hour=#HH#" } } ] } - ] + ], + "evaluate.rule": { + "id": 4, + "rules": [ + { + "id": 4, + "rule": "source.age=target.age AND source.desc=target.desc", + "name": "accuracy", + "dsl.type": "griffin-dsl", + "dq.type": "accuracy" + } + ] + } } ``` -It may return no content.That's because your measure id doesn't exist. <h2 id = "3"></h2> ## Jobs @@ -443,19 +652,19 @@ It may return no content.That's because your measure id doesn't exist. #### Request Body Sample ``` { - "measure.id": 1, + "measure.id": 5, "job.name":"job_name", "cron.expression": "0 0/4 * * * ?", "cron.time.zone": "GMT+8:00", "predicate.config": { "checkdonefile.schedule":{ - "interval": "5m", - "repeat": 12 + "interval": "1m", + "repeat": 2 } }, "data.segments": [ { - "data.connector.name": "connector_name_source_test", + "data.connector.name": "connector_name_source", "as.baseline":true, "segment.range": { "begin": "-1h", @@ -463,7 +672,7 @@ It may return no content.That's because your measure id doesn't exist. } }, { - "data.connector.name": "connector_name_target_test", + "data.connector.name": "connector_name_target", "segment.range": { "begin": "-1h", "length": "1h" @@ -473,29 +682,45 @@ It may return no content.That's because your measure id doesn't exist. } ``` #### Response Body Sample +The response body should be the created job schedule if success. For example: ``` { - "code": 205, - "description": "Create Job Succeed" -} -``` -It may return failed messages.Such as, - -``` -{ - "code": 405, - "description": "Create Job Failed" + "id": 3, + "measure.id": 5, + "job.name": "job_name", + "cron.expression": "0 0/4 * * * ?", + "cron.time.zone": "GMT+8:00", + "predicate.config": { + "checkdonefile.schedule": { + "interval": "1m", + "repeat": 2 + } + }, + "data.segments": [ + { + "id": 5, + "data.connector.name": "connector_name_source", + "as.baseline": true, + "segment.range": { + "id": 5, + "begin": "-1h", + "length": "1h" + } + }, + { + "id": 6, + "data.connector.name": "connector_name_target", + "as.baseline": false, + "segment.range": { + "id": 6, + "begin": "-1h", + "length": "1h" + } + } + ] } ``` -There are several reasons to create job failure. - -- Measure id does not exist. -- Job name already exits. -- Param as.baselines aren't set or are all false. -- Connector name doesn't exist in your measure. -- The trigger key already exists. - ### Get jobs `GET /api/v1/jobs` @@ -516,54 +741,82 @@ There are several reasons to create job failure. ``` ### Delete job by id -#### `DELETE /api/v1/jobs/{id}` +`DELETE /api/v1/jobs/{id}` #### Path Variable - id -`required` `Long` job id #### Response Body Sample -``` -{ - "code": 206, - "description": "Delete Job Succeed" -} -``` -It may return failed messages.Such as, +The response body should be empty if no error happens, and the HTTP status is (204, "No Content"). + +### Get job schedule by job name +`GET /api/v1/jobs/config/{jobName}` + +#### Path Variable +- jobName -`required` `String` job name + +#### Request Sample + +`/api/v1/jobs/config/job_no_predicate_day` + +#### Response Sample ``` { - "code": 406, - "description": "Delete Job Failed" + "id": 2, + "measure.id": 4, + "job.name": "job_no_predicate_day", + "cron.expression": "0 0/4 * * * ?", + "cron.time.zone": "GMT-8:00", + "predicate.config": { + "checkdonefile.schedule": { + "repeat": "12", + "interval": "5m" + } + }, + "data.segments": [ + { + "id": 3, + "data.connector.name": "source1517994133405", + "as.baseline": true, + "segment.range": { + "id": 3, + "begin": "-2", + "length": "2" + } + }, + { + "id": 4, + "data.connector.name": "target1517994142573", + "as.baseline": false, + "segment.range": { + "id": 4, + "begin": "-5", + "length": "2" + } + } + ] } ``` -The reason for failure may be that job id does not exist. ### Delete job by name -#### `DELETE /api/v1/jobs` +`DELETE /api/v1/jobs` + +#### Request Parameter + | name | description | type | example value | | ------- | ----------- | ------ | ------------- | | jobName | job name | String | job_name | #### Response Body Sample -``` -{ - "code": 206, - "description": "Delete Job Succeed" -} -``` -It may return failed messages.Such as, -``` -{ - "code": 406, - "description": "Delete Job Failed" -} -``` -The reason for failure may that job name does not exist. +The response body should be empty if no error happens, and the HTTP status is (204, "No Content"). ### Get job instances `GET /api/v1/jobs/instances` +#### Request Parameter + | name | description | type | example value | | ----- | ----------------------------------- | ---- | ------------- | | jobId | job id | Long | 1 | @@ -616,27 +869,45 @@ The reason for failure may that job name does not exist. ### Get metrics `GET /api/v1/metrics` #### Response Example +The response is a map of metrics group by measure name. For example: ``` -[ - { - "name": "external_name", - "description": " test measure", - "organization": "orgName", - "owner": "test", - "metricValues": [ - { - "name": "metricName", - "tmst": 1509599811123, - "value": { - "__tmst": 1509599811123, - "miss": 11, - "total": 125000, - "matched": 124989 +{ + "measure_no_predicate_day": [ + { + "name": "job_no_predicate_day", + "type": "accuracy", + "owner": "test", + "metricValues": [ + { + "name": "job_no_predicate_day", + "tmst": 1517994480000, + "value": { + "total": 125000, + "miss": 0, + "matched": 125000 + } + }, + { + "name": "job_no_predicate_day", + "tmst": 1517994240000, + "value": { + "total": 125000, + "miss": 0, + "matched": 125000 + } } - } - ] - } -] + ] + } + ], + "measre_predicate_hour": [ + { + "name": "job_predicate_hour", + "type": "accuracy", + "owner": "test", + "metricValues": [] + } + ] +} ``` ### Add metric values @@ -665,44 +936,73 @@ The reason for failure may that job name does not exist. ] ``` #### Response Body Sample -``` -{ - "code": 210, - "description": "Add Metric Values Success" -} -``` - -It may return failed message +The response body should have 'errors' field as 'false' if success, for example ``` { - "code": 412, - "description": "Add Metric Values Failed" + "took": 32, + "errors": false, + "items": [ + { + "index": { + "_index": "griffin", + "_type": "accuracy", + "_id": "AWFAs5pOJwYEbKWP7mhq", + "_version": 1, + "result": "created", + "_shards": { + "total": 2, + "successful": 1, + "failed": 0 + }, + "created": true, + "status": 201 + } + } + ] } ``` -The returned HTTP status code identifies the reason for failure. + ### Get metric values by name `GET /api/v1/metrics/values` #### Request Parameter -| name | description | type | example value | -| ---------- | ---------------------------------------- | ------ | ------------- | -| metricName | name of the metric values | String | metricName | -| size | max amount of return values | int | 5 | -| offset | the amount of records to skip by timestamp in descending order | int | 0 | - -Parameter offset is optional, it has default value as 0. +name | description | type | example value +--- | --- | --- | --- +metricName | name of the metric values | String | job_no_predicate_day +size | max amount of return records | int | 5 +offset | the amount of records to skip by timestamp in descending order | int | 0 +tmst | the start timestamp of records you want to get | long | 0 + +Parameter offset and tmst are optional. #### Response Body Sample ``` [ { - "name": "metricName", - "tmst": 1509599811123, + "name": "job_no_predicate_day", + "tmst": 1517994720000, "value": { - "__tmst": 1509599811123, - "miss": 11, "total": 125000, - "matched": 124989 + "miss": 0, + "matched": 125000 + } + }, + { + "name": "job_no_predicate_day", + "tmst": 1517994480000, + "value": { + "total": 125000, + "miss": 0, + "matched": 125000 + } + }, + { + "name": "job_no_predicate_day", + "tmst": 1517994240000, + "value": { + "total": 125000, + "miss": 0, + "matched": 125000 } } ] @@ -715,20 +1015,26 @@ Parameter offset is optional, it has default value as 0. | ---------- | ------------------------- | ------ | ------------- | | metricName | name of the metric values | String | metricName | #### Response Body Sample +The response body should have 'failures' field as empty if success, for example ``` { - "code": 211, - "description": "Delete Metric Values Success" -} -``` -It may return failed messages -``` -{ - "code": 413, - "description": "Delete Metric Values Failed" + "took": 363, + "timed_out": false, + "total": 5, + "deleted": 5, + "batches": 1, + "version_conflicts": 0, + "noops": 0, + "retries": { + "bulk": 0, + "search": 0 + }, + "throttled_millis": 0, + "requests_per_second": -1, + "throttled_until_millis": 0, + "failures": [] } ``` -The returned HTTP status code identifies the reason for failure. <h2 id = "5"></h2> ### Hive MetaStore
