GitHub user blrunner opened a pull request:

    https://github.com/apache/tajo/pull/136

    TAJO-1010: Improve multiple DISTINCT aggregation. (hyoungjun, jaehwa)

    Tajo supports various options for count distinct. Current option is to 
execute a count distinct query with two execution blocks. It made by 
DistinctGroupbyBuilder::buildPlan. But now, new option is to execute the query 
with three execution blocks. You can use this option for set 
SessionVars.COUNT_DISTINCT_ALGORITHM to three_stages.
    
    * In first stage, tajo operator incremented each row to more rows by 
grouping columns. In addition, the operator must creates each row because of 
aggregation non-distinct columns.
    * In second stage, tajo operator aggregates the output of the first stage. 
For reference, it shuffled by grouping columns and aggregation columns.
    * In third stage, tajo operator merges the output of the second stage. For 
reference, it shuffled by just grouping columns.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/blrunner/tajo TAJO-1010

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tajo/pull/136.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #136
    
----
commit 615d84f13e8dd496c9c096cf2eeb6f7e3e16dfa2
Author: Jaehwa Jung <[email protected]>
Date:   2014-09-11T06:30:31Z

    TAJO-1010: Improve multiple DISTINCT aggregation. (hyoungjun, jaehwa)

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to