GitHub user blrunner opened a pull request:

    https://github.com/apache/tajo/pull/126

    TAJO-1010: Improve multiple DISTINCT aggregation.

    Tajo supports various options for count distinct. Current option is to 
execute a count distinct query with two execution blocks. It made by 
DistinctGroupbyBuilder::buildPlan. But now, new option is to execute the query 
with three execution blocks. You can use this option for set 
SessionVars.COUNT_DISTINCT_ALGORITHM to three_stages.
     *  In first stage, tajo operator incremented each row to more rows by 
grouping columns. In addition, the operator must creates each row because of 
aggregation non-distinct columns.
     *  In second stage, tajo operator aggregates the output of the first 
stage. For reference, it shuffled by grouping columns and aggregation columns.
     * In third stage, tajo operator merges the output of the second stage. For 
reference, it shuffled by just grouping columns.
    
    For reference, this patch need to implement empty input data handling 
function and union with distinct count. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/blrunner/tajo TAJO-1010

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tajo/pull/126.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #126
    
----
commit 7c98709f0fcb06dfb675acae3d6489a6126f55b5
Author: jinossy <[email protected]>
Date:   2014-08-06T08:43:35Z

    TAJO-995: HiveMetaStoreClient wrapper should retry the connection

commit 415d0867ae4a4543f47360294bead1fc7f41e292
Author: Jaehwa Jung <[email protected]>
Date:   2014-08-10T06:07:24Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo

commit 7a7b4fd26f61df89cacdb4fc41faf9c2abe456b2
Author: Jaehwa Jung <[email protected]>
Date:   2014-08-11T02:28:48Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo

commit 45f5ed3adba931f4706f26dda1d3c03240ee11d3
Author: Jaehwa Jung <[email protected]>
Date:   2014-08-11T05:40:25Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo

commit aa01e83859ef553ac4eb90c1678e3bc6be20c6c9
Author: Jaehwa Jung <[email protected]>
Date:   2014-08-18T09:56:24Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo

commit 18cc27a64ee081dcd02af389e44db5d7ecfa1017
Author: Jaehwa Jung <[email protected]>
Date:   2014-08-20T03:15:37Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into 
TAJO-1010

commit 68b593b9462534951e3a8f7f4b0a7d2a3a16ae0d
Author: Jaehwa Jung <[email protected]>
Date:   2014-08-22T06:58:52Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into 
TAJO-1010

commit 2a97e00b70e4300067ab6758525ee7a541bff14a
Author: Jaehwa Jung <[email protected]>
Date:   2014-08-22T07:16:48Z

    Added DistinctNullDatum

commit b14d25a894cfcd4db566058a21b1a5762e39a525
Author: Jaehwa Jung <[email protected]>
Date:   2014-08-22T07:54:34Z

    Fixed DistinctNullDatum Error.

commit e3ab71ef11e3400c665ba687e0e26d4c7b424888
Author: Jaehwa Jung <[email protected]>
Date:   2014-08-22T08:04:51Z

    Added MetaDataTuple::IsDistinctNull.

commit 7323279a138502b840c912536dd2f9f7658fee4d
Author: Jaehwa Jung <[email protected]>
Date:   2014-08-25T07:05:59Z

    Implemented DistinctGroupbyIntermediateAggregationExec.

commit 27fa0c6a7b17296b7cad03d5d00ed081c54bac26
Author: Jaehwa Jung <[email protected]>
Date:   2014-08-25T07:07:00Z

    Merge branch 'master' of https://git-wip-us.apache.org/repos/asf/tajo into 
TAJO-1010

commit ccd01c5bdfb15c713102c1742c0763e177efc45b
Author: Jaehwa Jung <[email protected]>
Date:   2014-08-25T08:35:46Z

    Remove unused code.

commit 13bc50bde92f4bf309d94e623cea5eda5cda69db
Author: Jaehwa Jung <[email protected]>
Date:   2014-08-25T09:21:59Z

    Implemented DistinctGroupbyInitWriterExec.

commit 82edd24ced63484e7d50a79540ef50e0d1f0b5db
Author: Jaehwa Jung <[email protected]>
Date:   2014-08-26T18:59:51Z

    Implemented operators for count distinct three stages.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to