[
https://issues.apache.org/jira/browse/PHOENIX-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gerald Sangudi updated PHOENIX-4751:
------------------------------------
Summary: Support client-side hash aggregation with SORT_MERGE_JOIN (was:
Support hash aggregation with SORT_MERGE_JOIN)
> Support client-side hash aggregation with SORT_MERGE_JOIN
> ---------------------------------------------------------
>
> Key: PHOENIX-4751
> URL: https://issues.apache.org/jira/browse/PHOENIX-4751
> Project: Phoenix
> Issue Type: Improvement
> Affects Versions: 4.14.0, 4.13.1
> Reporter: Gerald Sangudi
> Priority: Major
>
> A GROUP BY that follows a SORT_MERGE_JOIN should be able to use hash
> aggregation in some cases, for improved performance.
> When a GROUP BY follows a SORT_MERGE_JOIN, the GROUP BY does not use hash
> aggregation. It instead performs a CLIENT SORT followed by a CLIENT
> AGGREGATE. The performance can be improved if (a) the GROUP BY output does
> not need to be sorted, and (b) the GROUP BY input is large enough and has low
> cardinality.
> The hash aggregation can initially be a hint. Here is an example from Phoenix
> 4.13.1 that would benefit from hash aggregation if the GROUP BY input is
> large with low cardinality.
> CREATE TABLE unsalted (
> keyA BIGINT NOT NULL,
> keyB BIGINT NOT NULL,
> val SMALLINT,
> CONSTRAINT pk PRIMARY KEY (keyA, keyB)
> );
> EXPLAIN
> SELECT /*+ USE_SORT_MERGE_JOIN */
> t1.val v1, t2.val v2, COUNT(\*) c
> FROM unsalted t1 JOIN unsalted t2
> ON (t1.keyA = t2.keyA)
> GROUP BY t1.val, t2.val;
> +------------------------------------------------------------+-----------------+----------------+--+
> | PLAN | EST_BYTES_READ
> | EST_ROWS_READ | |
> +------------------------------------------------------------+-----------------+----------------+--+
> | SORT-MERGE-JOIN (INNER) TABLES | null
> | null | |
> | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED | null
> | null | |
> | AND | null
> | null | |
> | CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED | null
> | null | |
> | CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL] | null
> | null | |
> | CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL] | null
> | null | |
> +------------------------------------------------------------+-----------------+----------------+--+
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)