GitHub user blrunner opened a pull request:
https://github.com/apache/tajo/pull/136
TAJO-1010: Improve multiple DISTINCT aggregation. (hyoungjun, jaehwa)
Tajo supports various options for count distinct. Current option is to
execute a count distinct query with two execution blocks. It made by
DistinctGroupbyBuilder::buildPlan. But now, new option is to execute the query
with three execution blocks. You can use this option for set
SessionVars.COUNT_DISTINCT_ALGORITHM to three_stages.
* In first stage, tajo operator incremented each row to more rows by
grouping columns. In addition, the operator must creates each row because of
aggregation non-distinct columns.
* In second stage, tajo operator aggregates the output of the first stage.
For reference, it shuffled by grouping columns and aggregation columns.
* In third stage, tajo operator merges the output of the second stage. For
reference, it shuffled by just grouping columns.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/blrunner/tajo TAJO-1010
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/tajo/pull/136.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #136
----
commit 615d84f13e8dd496c9c096cf2eeb6f7e3e16dfa2
Author: Jaehwa Jung <[email protected]>
Date: 2014-09-11T06:30:31Z
TAJO-1010: Improve multiple DISTINCT aggregation. (hyoungjun, jaehwa)
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---