[ https://issues.apache.org/jira/browse/PHOENIX-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gerald Sangudi updated PHOENIX-4751: ------------------------------------ Attachment: 0001-PHOENIX-4751-Implement-client-side-has.4.x-HBase-1.4.patch > Support client-side hash aggregation with SORT_MERGE_JOIN > --------------------------------------------------------- > > Key: PHOENIX-4751 > URL: https://issues.apache.org/jira/browse/PHOENIX-4751 > Project: Phoenix > Issue Type: Improvement > Affects Versions: 4.14.0, 4.13.1 > Reporter: Gerald Sangudi > Assignee: Gerald Sangudi > Priority: Major > Attachments: > 0001-PHOENIX-4751-Add-HASH_AGGREGATE-hint.4.x-HBase-1.4.patch, > 0001-PHOENIX-4751-Implement-client-side-has.4.x-HBase-1.4.patch, > 0002-PHOENIX-4751-Begin-implementation-of-c.4.x-HBase-1.4.patch, > 0003-PHOENIX-4751-Generated-aggregated-resu.4.x-HBase-1.4.patch, > 0004-PHOENIX-4751-Sort-results-of-client-ha.4.x-HBase-1.4.patch, > 0005-PHOENIX-4751-Add-integration-test-for-.4.x-HBase-1.4.patch, > 0006-PHOENIX-4751-Fix-and-run-integration-t.4.x-HBase-1.4.patch, > 0007-PHOENIX-4751-Add-integration-test-for-.4.x-HBase-1.4.patch, > 0008-PHOENIX-4751-Verify-EXPLAIN-plan-for-b.4.x-HBase-1.4.patch, > 0009-PHOENIX-4751-Standardize-null-checks-a.4.x-HBase-1.4.patch, > 0010-PHOENIX-4751-Abort-when-client-aggrega.4.x-HBase-1.4.patch, > 0011-PHOENIX-4751-Use-Phoenix-memory-mgmt-t.4.x-HBase-1.4.patch, > 0012-PHOENIX-4751-Remove-extra-memory-limit.4.x-HBase-1.4.patch, > 0013-PHOENIX-4751-Sort-only-when-necessary.4.x-HBase-1.4.patch, > 0014-PHOENIX-4751-Sort-only-when-necessary-.4.x-HBase-1.4.patch, > 0015-PHOENIX-4751-Show-client-hash-aggregat.4.x-HBase-1.4.patch, > 0016-PHOENIX-4751-Handle-reverse-sort-add-c.4.x-HBase-1.4.patch > > > A GROUP BY that follows a SORT_MERGE_JOIN should be able to use hash > aggregation in some cases, for improved performance. > When a GROUP BY follows a SORT_MERGE_JOIN, the GROUP BY does not use hash > aggregation. It instead performs a CLIENT SORT followed by a CLIENT > AGGREGATE. The performance can be improved if (a) the GROUP BY output does > not need to be sorted, and (b) the GROUP BY input is large enough and has low > cardinality. > The hash aggregation can initially be a hint. Here is an example from Phoenix > 4.13.1 that would benefit from hash aggregation if the GROUP BY input is > large with low cardinality. > CREATE TABLE unsalted ( > keyA BIGINT NOT NULL, > keyB BIGINT NOT NULL, > val SMALLINT, > CONSTRAINT pk PRIMARY KEY (keyA, keyB) > ); > EXPLAIN > SELECT /*+ USE_SORT_MERGE_JOIN */ > t1.val v1, t2.val v2, COUNT(\*) c > FROM unsalted t1 JOIN unsalted t2 > ON (t1.keyA = t2.keyA) > GROUP BY t1.val, t2.val; > > +-------------------------------------------------------------+----------------++------------------+ > |PLAN|EST_BYTES_READ|EST_ROWS_READ| | > +-------------------------------------------------------------+----------------++------------------+ > |SORT-MERGE-JOIN (INNER) TABLES|null|null| | > | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| | > |AND|null|null| | > | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| | > |CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]|null|null| | > |CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]|null|null| | > +-------------------------------------------------------------+----------------++------------------+ -- This message was sent by Atlassian JIRA (v7.6.3#76005)