[
https://issues.apache.org/jira/browse/TAJO-1010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14161453#comment-14161453
]
ASF GitHub Bot commented on TAJO-1010:
--------------------------------------
Github user hyunsik commented on the pull request:
https://github.com/apache/tajo/pull/136#issuecomment-58134795
Although I tried to give some advice for comments, I couldn't spend time on
it now.
However, this issue was scheduled to 0.9.0, and I think this improvement is
important in 0.9.0. So, I think that it is hard to delay the commit of this
issue to master.branch.
This patch already looks good and ready to be committed to master. So, I
propose that we commit it now and then revise the comment later.
Could you rebase it against the latest patch? If so, I'll finish the review
on this patch.
> Improve multiple DISTINCT aggregation.
> --------------------------------------
>
> Key: TAJO-1010
> URL: https://issues.apache.org/jira/browse/TAJO-1010
> Project: Tajo
> Issue Type: Improvement
> Components: planner/optimizer
> Reporter: Jaehwa Jung
> Assignee: Jaehwa Jung
> Fix For: 0.9.0
>
>
> Currently, tajo provides three stage for optimizing distinct query
> aggregation. But it just supports one column for distinct aggregation as
> follows:
> {code:title=Query1|borderStyle=solid}
> select a.flag, count(distinct a.id) as cnt, sum(distinct a.id) as total
> from table1
> group by a.flag
> {code}
> If you write two more columns for distinct aggregation, you can't apply
> optimized distinct aggregation as follows:
> {code:title=Query2|borderStyle=solid}
> select a.flag, count(distinct a.id) as cnt, sum(distinct a.id) as total
> , count(distinct a.name) as cnt2, count(distinct a.code) as cnt3
> from table1
> group by a.flag
> {code}
> In this case, you may see low performance for your query. Thus, we need to
> improve multiple DISTINCT aggregation. Correctly, we should support three
> stage for multiple DISTINCT aggregation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)