[ https://issues.apache.org/jira/browse/PHOENIX-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16530424#comment-16530424 ]
ASF GitHub Bot commented on PHOENIX-4751: ----------------------------------------- GitHub user geraldss opened a pull request: https://github.com/apache/phoenix/pull/308 Client-side hash aggregation Client-side hash aggregation for use with sort-merge join. Implements https://issues.apache.org/jira/browse/PHOENIX-4751 You can merge this pull request into a Git repository by running: $ git pull https://github.com/geraldss/phoenix master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/phoenix/pull/308.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #308 ---- commit c8acc6cb39e222a5206c79566552c5c27cbe27f1 Author: Gerald Sangudi <gsangudi@...> Date: 2018-06-14T19:49:30Z PHOENIX-4751 Add HASH_AGGREGATE hint commit a261b3f94f753b4a8d6baaad6168e76f97d76bb6 Author: Gerald Sangudi <gsangudi@...> Date: 2018-06-16T04:17:32Z PHOENIX-4751 Begin implementation of client hash aggregation commit 863d24e34a83282f90d5d2db05522b678dfced74 Author: Rajeshbabu Chintaguntla <rajeshbabu@...> Date: 2018-06-15T22:38:44Z PHOENIX-4786 Reduce log level to debug when logging new aggregate row key found and added results for scan ordered queries(Rajeshbabu) commit cfae7ddcfa5b58a367cd0c57c23f394ceb9f1259 Author: Gerald Sangudi <gsangudi@...> Date: 2018-06-16T04:55:00Z Merge remote-tracking branch 'upstream/master' commit 1f453308a24be49a8036292671d51eb25137d680 Author: Gerald Sangudi <gsangudi@...> Date: 2018-06-20T17:47:34Z PHOENIX-4751 Generated aggregated results commit 66aaacfd989c63e18fb9a5c5b9e133519ab93507 Author: Gerald Sangudi <gsangudi@...> Date: 2018-06-24T23:18:14Z PHOENIX-4751 Sort results of client hash aggregation commit a6c2b7ce738710cfdffc1e9e4d1d234d2090a225 Author: James Taylor <jamestaylor@...> Date: 2018-06-18T13:00:02Z PHOENIX-4789 Exception when setting TTL on Tephra transactional table commit fba4196fcace83d4e42e902d2cb6295bb519ed39 Author: Ankit Singhal <ankitsinghal59@...> Date: 2018-06-21T23:11:02Z PHOENIX-4785 Unable to write to table if index is made active during retry commit 05de081b386c502b6c90ff24357ed7dbbc6dedd2 Author: Gerald Sangudi <gsangudi@...> Date: 2018-06-29T05:01:55Z PHOENIX-4751 Add integration test for client hash aggregation commit b7960d0daedc6ce3c2fbcf0794e4a95639d7ba3c Author: Gerald Sangudi <gsangudi@...> Date: 2018-06-30T00:03:59Z PHOENIX-4751 Fix and run integration tests for query results commit a3629ac64b90c117f5caceddbb45fb9dc14649b8 Author: Gerald Sangudi <gsangudi@...> Date: 2018-06-30T06:22:43Z PHOENIX-4751 Add integration test for EXPLAIN commit 3aa85d5c04309f6e0c5167c002e9dcb6091ea757 Author: Gerald Sangudi <gsangudi@...> Date: 2018-06-30T17:13:17Z PHOENIX-4751 Verify EXPLAIN plan for both salted and unsalted ---- > Support client-side hash aggregation with SORT_MERGE_JOIN > --------------------------------------------------------- > > Key: PHOENIX-4751 > URL: https://issues.apache.org/jira/browse/PHOENIX-4751 > Project: Phoenix > Issue Type: Improvement > Affects Versions: 4.14.0, 4.13.1 > Reporter: Gerald Sangudi > Priority: Major > > A GROUP BY that follows a SORT_MERGE_JOIN should be able to use hash > aggregation in some cases, for improved performance. > When a GROUP BY follows a SORT_MERGE_JOIN, the GROUP BY does not use hash > aggregation. It instead performs a CLIENT SORT followed by a CLIENT > AGGREGATE. The performance can be improved if (a) the GROUP BY output does > not need to be sorted, and (b) the GROUP BY input is large enough and has low > cardinality. > The hash aggregation can initially be a hint. Here is an example from Phoenix > 4.13.1 that would benefit from hash aggregation if the GROUP BY input is > large with low cardinality. > CREATE TABLE unsalted ( > keyA BIGINT NOT NULL, > keyB BIGINT NOT NULL, > val SMALLINT, > CONSTRAINT pk PRIMARY KEY (keyA, keyB) > ); > EXPLAIN > SELECT /*+ USE_SORT_MERGE_JOIN */ > t1.val v1, t2.val v2, COUNT(\*) c > FROM unsalted t1 JOIN unsalted t2 > ON (t1.keyA = t2.keyA) > GROUP BY t1.val, t2.val; > > +-------------------------------------------------------------+----------------++------------------+ > |PLAN|EST_BYTES_READ|EST_ROWS_READ| | > +-------------------------------------------------------------+----------------++------------------+ > |SORT-MERGE-JOIN (INNER) TABLES|null|null| | > | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| | > |AND|null|null| | > | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| | > |CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]|null|null| | > |CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]|null|null| | > +-------------------------------------------------------------+----------------++------------------+ -- This message was sent by Atlassian JIRA (v7.6.3#76005)