Azhar created GRIFFIN-335:
-----------------------------
Summary: Hive Connector: Ability to Use "group by" caluse
Key: GRIFFIN-335
URL: https://issues.apache.org/jira/browse/GRIFFIN-335
Project: Griffin
Issue Type: Improvement
Components: accuracy-batch
Affects Versions: 0.6.0
Reporter: Azhar
Refer to [https://issues.apache.org/jira/projects/GRIFFIN/issues/GRIFFIN-332].
If we have the ability to select specific columns, it will open the door to use
sql base aggregation, further reducing the volume of data from JDBC sources.
So, I propose the feature to allow JDBC connector to able to use sql based
aggregations using clause `groupby`
Let's say we have source and target tables that have data like below.
src:
{code:java}
------------------------
|employee_id |country|
------------------------
|1 | NZ |
|2 | DE |
|3 | DE |
|4 | NZ |
|5 | DE |
....
....
------------------------
{code}
tgt:
{code:java}
------------------------
|total_employee|country|
------------------------
|10 | NZ |
|11 | DE |
------------------------
{code}
Then we can perform `accuracy` check directly like below using `columns` and
`groupby` clauses for source table:
{code:java}
{ "name":"src",
"connector":{ "type":"jdbc",
"config":{ "database":"mydatabase",
"tablename":"mytable",
"columns":"count(*) total_employee, country",
"groupby":"country",
"url":"jdbc:sqlserver://myhost:1433;databaseName=mydatabase",
"user":"user",
"password":"password",
"driver":"com.microsoft.sqlserver.jdbc.SQLServerDriver",
"where":""
}
}
}
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)