[jira] [Commented] (GRIFFIN-335) Hive Connector: Ability to Use "group by" caluse

Tushar (Jira) Sun, 19 Jul 2020 02:56:02 -0700


    [ 
https://issues.apache.org/jira/browse/GRIFFIN-335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17160629#comment-17160629
 ]


Tushar commented on GRIFFIN-335:
--------------------------------

Hi [~obaid],

This is not regarding how much effort required to support this change. We 
already have a Profiling dimension where you can model your use case so why you 
want to add functionality that is already supported by Griffin.

Can you please also explain what additional steps you required to take to 
support your usecase ?

> Hive Connector: Ability to Use "group by" caluse
> ------------------------------------------------
>
>                 Key: GRIFFIN-335
>                 URL: https://issues.apache.org/jira/browse/GRIFFIN-335
>             Project: Griffin
>          Issue Type: Improvement
>          Components: accuracy-batch
>    Affects Versions: 0.6.0
>            Reporter: Azhar
>            Priority: Major
>              Labels: columns, groupby, hive
>
> *Background:*
> Refer to [https://issues.apache.org/jira/projects/GRIFFIN/issues/GRIFFIN-334 
> |https://issues.apache.org/jira/projects/GRIFFIN/issues/GRIFFIN-332]and 
> https://issues.apache.org/jira/browse/GRIFFIN-333 .
>  If we have the ability to select specific columns, it will open the door to 
> use SQLbase aggregation, further reducing volume of data from Hive sources.
> *Proposed Improvement:*
>  So, I propose the feature to allow Hive connector to able to use SQL based 
> aggregations.
>  
> Let's say we have source and target tables that have data like below.
> src:
> {code:java}
> ------------------------
> |employee_id   |country|
> ------------------------
> |1             | NZ    |
> |2             | DE    |
> |3             | DE    |
> |4             | NZ    |
> |5             | DE    |
> ....
> ....
> ------------------------
> {code}
> tgt:
> {code:java}
> ------------------------
> |total_employee|country|
> ------------------------
> |10            | NZ    |
> |11            | DE    |
> ------------------------
> {code}
> Then we can perform `accuracy` check [ `"rule":"src.total_employee = 
> tgt.total_employee and src.country = tgt.country "` ]  directly  like below 
> using `columns` and `groupby` clauses for source table:
> {code:java}
>       {
>          "name":"src",
>          "connector":{
>             "type":"hive",
>             "config":{
>                "database":"mydatabase",
>                "table.name":"mytable",
>                "columns": "count(*) total_employee, country",
>                "groupby": "country",
>                "where":""
>             }
>          }
>       }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (GRIFFIN-335) Hive Connector: Ability to Use "group by" caluse

Reply via email to