Jaehwa Jung created TAJO-1010:
---------------------------------
Summary: Improve multiple DISTINCT aggregation.
Key: TAJO-1010
URL: https://issues.apache.org/jira/browse/TAJO-1010
Project: Tajo
Issue Type: Improvement
Components: planner/optimizer
Reporter: Jaehwa Jung
Assignee: Jaehwa Jung
Currently, tajo provides three stage for optimizing distinct query aggregation.
But it just supports one column for distinct aggregation as follows:
{code:title=Query1|borderStyle=solid}
select a.flag, count(distinct a.id) as cnt, sum(distinct a.id) as total
from table1
group by a.flag
{code}
If you write two more columns for distinct aggregation, you can't apply
optimized distinct aggregation as follows:
{code:title=Query2|borderStyle=solid}
select a.flag, count(distinct a.id) as cnt, sum(distinct a.id) as total
, count(distinct a.name) as cnt2, count(distinct a.code) as cnt3
from table1
group by a.flag
{code}
In this case, you may see low performance for your query. Thus, we need to
improve multiple DISTINCT aggregation. Correctly, we should support three stage
for multiple DISTINCT aggregation.
--
This message was sent by Atlassian JIRA
(v6.2#6252)