[jira] [Commented] (PHOENIX-4751) Support client-side hash aggregation with SORT_MERGE_JOIN

Hadoop QA (JIRA) Thu, 02 Aug 2018 18:42:28 -0700


    [ 
https://issues.apache.org/jira/browse/PHOENIX-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16567674#comment-16567674
 ]


Hadoop QA commented on PHOENIX-4751:
------------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  
http://issues.apache.org/jira/secure/attachment/12933997/0001-PHOENIX-4751-Implement-client-side-hash-aggre.master.patch
  against master branch at commit 4a9e5be82db18d942f365370abe4a3104780eea8.
  ATTACHMENT ID: 12933997

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
                        Please justify why no new tests are needed for this 
patch.
                        Also please list what manual steps were performed to 
verify this patch.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
    +    private void verifyExplain(Connection conn, String table, boolean 
swap, boolean sort) throws Exception {
+    private void verifyResults(Connection conn, String table, int c1, int c2, 
boolean swap, boolean sort) throws Exception {
+                aggResultIterator = new 
ClientGroupedAggregatingResultIterator(LookAheadResultIterator.wrap(iterator), 
serverAggregators, keyExpressions);
+                    (QueryServices.SPOOL_THRESHOLD_BYTES_ATTRIB, 
QueryServicesOptions.DEFAULT_SPOOL_THRESHOLD_BYTES);
+                    aggResultIterator = new 
ClientHashAggregatingResultIterator(context, iterator, serverAggregators, 
keyExpressions, orderBy);
+                    iterator = new OrderedResultIterator(iterator, 
keyExpressionOrderBy, thresholdBytes, null, null, 
projector.getEstimatedRowByteSize());
+                    aggResultIterator = new 
ClientGroupedAggregatingResultIterator(LookAheadResultIterator.wrap(iterator), 
serverAggregators, keyExpressions);
+            planSteps.add("CLIENT AGGREGATE INTO DISTINCT ROWS BY " + 
groupBy.getExpressions().toString());
+            planSteps.add("CLIENT HASH AGGREGATE INTO DISTINCT ROWS BY " + 
groupBy.getExpressions().toString());
+            if (orderBy == OrderBy.FWD_ROW_KEY_ORDER_BY || orderBy == 
OrderBy.REV_ROW_KEY_ORDER_BY) {

     {color:red}-1 core tests{color}.  The patch failed these unit tests:
     
./phoenix-core/target/failsafe-reports/TEST-org.apache.phoenix.end2end.index.PartialIndexRebuilderIT
./phoenix-core/target/failsafe-reports/TEST-org.apache.phoenix.end2end.ConcurrentMutationsIT

Test results: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/1956//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-PHOENIX-Build/1956//console

This message is automatically generated.

> Support client-side hash aggregation with SORT_MERGE_JOIN
> ---------------------------------------------------------
>
>                 Key: PHOENIX-4751
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4751
>             Project: Phoenix
>          Issue Type: Improvement
>    Affects Versions: 4.14.0, 4.13.1
>            Reporter: Gerald Sangudi
>            Assignee: Gerald Sangudi
>            Priority: Major
>         Attachments: 
> 0001-PHOENIX-4751-Add-HASH_AGGREGATE-hint.4.x-HBase-1.4.patch, 
> 0001-PHOENIX-4751-Implement-client-side-has.4.x-HBase-1.4.patch, 
> 0001-PHOENIX-4751-Implement-client-side-hash-aggre.master.patch, 
> 0002-PHOENIX-4751-Begin-implementation-of-c.4.x-HBase-1.4.patch, 
> 0003-PHOENIX-4751-Generated-aggregated-resu.4.x-HBase-1.4.patch, 
> 0004-PHOENIX-4751-Sort-results-of-client-ha.4.x-HBase-1.4.patch, 
> 0005-PHOENIX-4751-Add-integration-test-for-.4.x-HBase-1.4.patch, 
> 0006-PHOENIX-4751-Fix-and-run-integration-t.4.x-HBase-1.4.patch, 
> 0007-PHOENIX-4751-Add-integration-test-for-.4.x-HBase-1.4.patch, 
> 0008-PHOENIX-4751-Verify-EXPLAIN-plan-for-b.4.x-HBase-1.4.patch, 
> 0009-PHOENIX-4751-Standardize-null-checks-a.4.x-HBase-1.4.patch, 
> 0010-PHOENIX-4751-Abort-when-client-aggrega.4.x-HBase-1.4.patch, 
> 0011-PHOENIX-4751-Use-Phoenix-memory-mgmt-t.4.x-HBase-1.4.patch, 
> 0012-PHOENIX-4751-Remove-extra-memory-limit.4.x-HBase-1.4.patch, 
> 0013-PHOENIX-4751-Sort-only-when-necessary.4.x-HBase-1.4.patch, 
> 0014-PHOENIX-4751-Sort-only-when-necessary-.4.x-HBase-1.4.patch, 
> 0015-PHOENIX-4751-Show-client-hash-aggregat.4.x-HBase-1.4.patch, 
> 0016-PHOENIX-4751-Handle-reverse-sort-add-c.4.x-HBase-1.4.patch
>
>
> A GROUP BY that follows a SORT_MERGE_JOIN should be able to use hash 
> aggregation in some cases, for improved performance.
> When a GROUP BY follows a SORT_MERGE_JOIN, the GROUP BY does not use hash 
> aggregation. It instead performs a CLIENT SORT followed by a CLIENT 
> AGGREGATE. The performance can be improved if (a) the GROUP BY output does 
> not need to be sorted, and (b) the GROUP BY input is large enough and has low 
> cardinality.
> The hash aggregation can initially be a hint. Here is an example from Phoenix 
> 4.13.1 that would benefit from hash aggregation if the GROUP BY input is 
> large with low cardinality.
> CREATE TABLE unsalted (
>  keyA BIGINT NOT NULL,
>  keyB BIGINT NOT NULL,
>  val SMALLINT,
>  CONSTRAINT pk PRIMARY KEY (keyA, keyB)
>  );
> EXPLAIN
>  SELECT /*+ USE_SORT_MERGE_JOIN */ 
>  t1.val v1, t2.val v2, COUNT(\*) c 
>  FROM unsalted t1 JOIN unsalted t2 
>  ON (t1.keyA = t2.keyA) 
>  GROUP BY t1.val, t2.val;
>  
> +-------------------------------------------------------------+----------------++------------------+
> |PLAN|EST_BYTES_READ|EST_ROWS_READ| |
> +-------------------------------------------------------------+----------------++------------------+
> |SORT-MERGE-JOIN (INNER) TABLES|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |AND|null|null| |
> |    CLIENT 1-CHUNK PARALLEL 1-WAY FULL SCAN OVER UNSALTED|null|null| |
> |CLIENT SORTED BY [TO_DECIMAL(T1.VAL), T2.VAL]|null|null| |
> |CLIENT AGGREGATE INTO DISTINCT ROWS BY [T1.VAL, T2.VAL]|null|null| |
> +-------------------------------------------------------------+----------------++------------------+



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (PHOENIX-4751) Support client-side hash aggregation with SORT_MERGE_JOIN

Reply via email to