Repository: incubator-griffin
Updated Branches:
  refs/heads/master 69b666e58 -> cb74c3490


Fix doc issue.

dsl-guide describes two kinds of rules, Uniqueness and Distinctness. actually 
they are similiar rule, so we merely keep Uniqueness rule.

Author: Eugene <liu...@apache.org>

Closes #420 from toyboxman/doc/dsl-guide.


Project: http://git-wip-us.apache.org/repos/asf/incubator-griffin/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-griffin/commit/cb74c349
Tree: http://git-wip-us.apache.org/repos/asf/incubator-griffin/tree/cb74c349
Diff: http://git-wip-us.apache.org/repos/asf/incubator-griffin/diff/cb74c349

Branch: refs/heads/master
Commit: cb74c34909055540a260b6f2630274fabfadc0b8
Parents: 69b666e
Author: Eugene <liu...@apache.org>
Authored: Mon Oct 8 09:43:52 2018 +0800
Committer: Lionel Liu <bhlx3l...@163.com>
Committed: Mon Oct 8 09:43:52 2018 +0800

----------------------------------------------------------------------
 griffin-doc/measure/dsl-guide.md | 31 ++++++-------------------------
 1 file changed, 6 insertions(+), 25 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-griffin/blob/cb74c349/griffin-doc/measure/dsl-guide.md
----------------------------------------------------------------------
diff --git a/griffin-doc/measure/dsl-guide.md b/griffin-doc/measure/dsl-guide.md
index e3b26ab..779ea6a 100644
--- a/griffin-doc/measure/dsl-guide.md
+++ b/griffin-doc/measure/dsl-guide.md
@@ -18,15 +18,15 @@ under the License.
 -->
 
 # Apache Griffin DSL Guide
-Griffin DSL is designed for DQ measurement, as a SQL-like language, trying to 
describe the DQ domain request.
+Griffin DSL is designed for DQ measurement, as a SQL-like language, which 
describes the DQ domain request.
 
 ## Griffin DSL Syntax Description
-Griffin DSL is SQL-like, case insensitive, and easy to learn.
+Griffin DSL syntax is easy to learn as it's SQL-like, case insensitive.
 
 ### Supporting process
-- logical operation: not, and, or, in, between, like, is null, is nan, =, !=, 
<>, <=, >=, <, >
-- mathematical operation: +, -, *, /, %
-- sql statement: as, where, group by, having, order by, limit
+- logical operation: `not, and, or, in, between, like, is null, is nan, =, !=, 
<>, <=, >=, <, >`
+- mathematical operation: `+, -, *, /, %`
+- sql statement: `as, where, group by, having, order by, limit`
 
 ### Keywords
 - `null, nan, true, false`
@@ -57,7 +57,7 @@ Griffin DSL is SQL-like, case insensitive, and easy to learn.
        e.g. `*`, `source.*`, `target.*`
 - **field selection**: field name or with data source name ahead.  
        e.g. `source.age`, `target.name`, `user_id`
-- **index selection**: interget between square brackets "[]" with field name 
ahead.  
+- **index selection**: integer between square brackets "[]" with field name 
ahead.  
        e.g. `source.attributes[3]`
 - **function selection**: function name with brackets "()", with field name 
ahead or not.  
        e.g. `count(*)`, `*.count()`, `source.user_id.count()`, 
`max(source.age)`
@@ -121,10 +121,6 @@ Accuracy rule expression in Griffin DSL is a logical 
expression, telling the map
 Profiling rule expression in Griffin DSL is a sql-like expression, with select 
clause ahead, following optional from clause, where clause, group-by clause, 
order-by clause, limit clause in order.  
        e.g. `source.gender, source.id.count() where source.age > 20 group by 
source.gender`, `select country, max(age), min(age), count(*) as cnt from 
source group by country order by cnt desc limit 5`
 
-### Uniqueness Rule
-Uniqueness rule expression in Griffin DSL is a list of selection expressions 
separated by comma, indicates the columns to check if is unique.  
-       e.g. `name, age`, `name, (age + 1) as next_age`
-
 ### Distinctness Rule
 Distinctness rule expression in Griffin DSL is a list of selection expressions 
separated by comma, indicates the columns to check if is distinct.
     e.g. `name, age`, `name, (age + 1) as next_age`
@@ -155,21 +151,6 @@ For example, the dsl rule is `source.cntry, 
source.id.count(), source.age.max()
 
 After the translation, the metrics will be persisted in table `profiling`.
 
-### Uniqueness
-For uniqueness, or called duplicate, is to find out the duplicate items of 
data, and rollup the items count group by duplicate times.
-For example, the dsl rule is `name, age`, which represents the duplicate 
requests, in this case, source and target are the same data set. After the 
translation, the sql rule is as below:
-- **get distinct items from source**: `SELECT name, age FROM source`, save as 
table `src`.
-- **get all items from target**: `SELECT name, age FROM target`, save as table 
`tgt`.
-- **join two tables**: `SELECT src.name, src.age FROM tgt RIGHT JOIN src ON 
coalesce(src.name, '') = coalesce(tgt.name, '') AND coalesce(src.age, '') = 
coalesce(tgt.age, '')`, save as table `joined`.
-- **get items duplication**: `SELECT name, age, (count(*) - 1) AS dup FROM 
joined GROUP BY name, age`, save as table `grouped`.
-- **get total metric**: `SELECT count(*) FROM source`, save as table 
`total_metric`.
-- **get unique record**: `SELECT * FROM grouped WHERE dup = 0`, save as table 
`unique_record`.
-- **get unique metric**: `SELECT count(*) FROM unique_record`, save as table 
`unique_metric`.
-- **get duplicate record**: `SELECT * FROM grouped WHERE dup > 0`, save as 
table `dup_record`.
-- **get duplicate metric**: `SELECT dup, count(*) AS num FROM dup_records 
GROUP BY dup`, save as table `dup_metric`.
-
-After the translation, the metrics will be persisted in table `dup_metric`.
-
 ### Distinctness
 For distinctness, is to find out the duplicate items of data, the same as 
uniqueness in batch mode, but with some differences in streaming mode.
 In most time, you need distinctness other than uniqueness.

Reply via email to