[ 
https://issues.apache.org/jira/browse/PHOENIX-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530424#comment-16530424
 ] 

ASF GitHub Bot commented on PHOENIX-4751:
-----------------------------------------

GitHub user geraldss opened a pull request:

    https://github.com/apache/phoenix/pull/308

    Client-side hash aggregation

    Client-side hash aggregation for use with sort-merge join.
    
    Implements https://issues.apache.org/jira/browse/PHOENIX-4751


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/geraldss/phoenix master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/phoenix/pull/308.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #308
    
----
commit c8acc6cb39e222a5206c79566552c5c27cbe27f1
Author: Gerald Sangudi <gsangudi@...>
Date:   2018-06-14T19:49:30Z

    PHOENIX-4751 Add HASH_AGGREGATE hint

commit a261b3f94f753b4a8d6baaad6168e76f97d76bb6
Author: Gerald Sangudi <gsangudi@...>
Date:   2018-06-16T04:17:32Z

    PHOENIX-4751 Begin implementation of client hash aggregation

commit 863d24e34a83282f90d5d2db05522b678dfced74
Author: Rajeshbabu Chintaguntla <rajeshbabu@...>
Date:   2018-06-15T22:38:44Z

    PHOENIX-4786 Reduce log level to debug when logging new aggregate row key 
found and added results for scan ordered queries(Rajeshbabu)

commit cfae7ddcfa5b58a367cd0c57c23f394ceb9f1259
Author: Gerald Sangudi <gsangudi@...>
Date:   2018-06-16T04:55:00Z

    Merge remote-tracking branch 'upstream/master'

commit 1f453308a24be49a8036292671d51eb25137d680
Author: Gerald Sangudi <gsangudi@...>
Date:   2018-06-20T17:47:34Z

    PHOENIX-4751 Generated aggregated results

commit 66aaacfd989c63e18fb9a5c5b9e133519ab93507
Author: Gerald Sangudi <gsangudi@...>
Date:   2018-06-24T23:18:14Z

    PHOENIX-4751 Sort results of client hash aggregation

commit a6c2b7ce738710cfdffc1e9e4d1d234d2090a225
Author: James Taylor <jamestaylor@...>
Date:   2018-06-18T13:00:02Z

    PHOENIX-4789 Exception when setting TTL on Tephra transactional table

commit fba4196fcace83d4e42e902d2cb6295bb519ed39
Author: Ankit Singhal <ankitsinghal59@...>
Date:   2018-06-21T23:11:02Z

    PHOENIX-4785 Unable to write to table if index is made active during retry

commit 05de081b386c502b6c90ff24357ed7dbbc6dedd2
Author: Gerald Sangudi <gsangudi@...>
Date:   2018-06-29T05:01:55Z

    PHOENIX-4751 Add integration test for client hash aggregation

commit b7960d0daedc6ce3c2fbcf0794e4a95639d7ba3c
Author: Gerald Sangudi <gsangudi@...>
Date:   2018-06-30T00:03:59Z

    PHOENIX-4751 Fix and run integration tests for query results

commit a3629ac64b90c117f5caceddbb45fb9dc14649b8
Author: Gerald Sangudi <gsangudi@...>
Date:   2018-06-30T06:22:43Z

    PHOENIX-4751 Add integration test for EXPLAIN

commit 3aa85d5c04309f6e0c5167c002e9dcb6091ea757
Author: Gerald Sangudi <gsangudi@...>
Date:   2018-06-30T17:13:17Z

    PHOENIX-4751 Verify EXPLAIN plan for both salted and unsalted

----


> Support client-side hash aggregation with SORT_MERGE_JOIN
> ---------------------------------------------------------
>
>                 Key: PHOENIX-4751
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4751
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 4.14.0, 4.13.1
>            Reporter: Gerald Sangudi
>            Priority: Major
>
> A GROUP BY that follows a SORT_MERGE_JOIN should be able to use hash 
> aggregation in some cases, for improved performance.
> When a GROUP BY follows a SORT_MERGE_JOIN, the GROUP BY does not use hash 
> aggregation. It instead performs a CLIENT SORT followed by a CLIENT 
> AGGREGATE. The performance can be improved if (a) the GROUP BY output does 
> not need to be sorted, and (b) the GROUP BY input is large enough and has low 
> cardinality.
> The hash aggregation can initially be a hint. Here is an example from Phoenix 
> 4.13.1 that would benefit from hash aggregation if the GROUP BY input is 
> large with low cardinality.
> CREATE TABLE unsalted (
>  keyA BIGINT NOT NULL,
>  keyB BIGINT NOT NULL,
>  val SMALLINT,
>  CONSTRAINT pk PRIMARY KEY (keyA, keyB)
>  );
> EXPLAIN
>  SELECT /*+ USE_SORT_MERGE_JOIN */ 
>  t1.val v1, t2.val v2, COUNT(\*) c 
>  FROM unsalted t1 JOIN unsalted t2 
>  ON (t1.keyA = t2.keyA) 
>  GROUP BY t1.val, t2.val;
>  
> +-------------------------------------------------------------+----------------++------------------+
> |PLAN|EST_BYTES_READ|EST_ROWS_READ| |
> +-------------------------------------------------------------+----------------++------------------+
> |SORT-MERGE-JOIN (INNER) TABLES|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |AND|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]|null|null| |
> |CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]|null|null| |
> +-------------------------------------------------------------+----------------++------------------+



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to