[ 
https://issues.apache.org/jira/browse/GRIFFIN-333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17158096#comment-17158096
 ] 

Tushar edited comment on GRIFFIN-333 at 7/15/20, 12:55 PM:
-----------------------------------------------------------

Hi [~obaid],

If you want to filter the number of rows then there is provision for putting  
"where" clause in source definition. 

Adding aggregate function to filter the number of rows is not very generic use 
case to add in griffin. However, you can still define your use case by using 
profiling measure of griffin.


was (Author: tushar.patil):
Hi [~obaid],

If you want to filter the number of rows then there is provision for putting  
"where" clause in source definition. 

Adding aggregate function to filter the number of rows is not very generic use 
case to add those changes. However, you can still define your use case by using 
profiling measure of griffin.

> JDBC Connector: Ability to Use "group by" caluse
> ------------------------------------------------
>
>                 Key: GRIFFIN-333
>                 URL: https://issues.apache.org/jira/browse/GRIFFIN-333
>             Project: Griffin
>          Issue Type: Improvement
>          Components: accuracy-batch
>    Affects Versions: 0.6.0
>            Reporter: Obaidul Karim
>            Priority: Major
>              Labels: column, groupby, jdbc
>
> *Background:*
> Refer to [https://issues.apache.org/jira/projects/GRIFFIN/issues/GRIFFIN-332].
> If we have the ability to select specific columns, it will open the door to 
> use sql base aggregation, further reducing the volume of data from JDBC 
> sources.
>  
> *Proposed Improvement:*
> So, I propose the feature to allow JDBC connector to able to use sql based 
> aggregations using clause `groupby`
> *Example:*
> Let's say we have source and target tables that have data like below.
> src:
> {code:java}
> ------------------------
> |employee_id   |country|
> ------------------------
> |1             | NZ    |
> |2             | DE    |
> |3             | DE    |
> |4             | NZ    |
> |5             | DE    |
> ....
> ....
> ------------------------
> {code}
> tgt:
> {code:java}
> ------------------------
> |total_employee|country|
> ------------------------
> |10            | NZ    |
> |11            | DE    |
> ------------------------
> {code}
> Then we can perform `accuracy` check [ `"rule":"src.total_employee = 
> tgt.total_employee and src.country = tgt.country "` ]  directly  like below 
> using `columns` and `groupby` clauses for source table:
> {code:java}
> {   
>    "name":"src",
>    "connector":{      
>       "type":"jdbc",
>       "config":{         
>          "database":"mydatabase",
>          "tablename":"mytable",
>          "columns":"count(*) total_employee, country",
>          "groupby":"country",
>          "url":"jdbc:sqlserver://myhost:1433;databaseName=mydatabase",
>          "user":"user",
>          "password":"password",
>          "driver":"com.microsoft.sqlserver.jdbc.SQLServerDriver",
>          "where":""
>       }
>    }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to