Jaehwa Jung created TAJO-1010:
---------------------------------

             Summary: Improve multiple DISTINCT aggregation.
                 Key: TAJO-1010
                 URL: https://issues.apache.org/jira/browse/TAJO-1010
             Project: Tajo
          Issue Type: Improvement
          Components: planner/optimizer
            Reporter: Jaehwa Jung
            Assignee: Jaehwa Jung


Currently, tajo provides three stage for optimizing distinct query aggregation. 
But it just supports one column for distinct aggregation as follows:
{code:title=Query1|borderStyle=solid}
select a.flag, count(distinct a.id) as cnt, sum(distinct a.id) as total
from table1
group by a.flag
{code}

If you write two more columns for distinct aggregation, you can't apply 
optimized distinct aggregation as follows:
{code:title=Query2|borderStyle=solid}
select a.flag, count(distinct a.id) as cnt, sum(distinct a.id) as total
, count(distinct a.name) as cnt2, count(distinct a.code) as cnt3
from table1
group by a.flag
{code}

In this case, you may see low performance for your query. Thus, we need to 
improve multiple DISTINCT aggregation. Correctly, we should support three stage 
for multiple DISTINCT aggregation.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to