[ 
https://issues.apache.org/jira/browse/IMPALA-12548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17784597#comment-17784597
 ] 

ASF subversion and git services commented on IMPALA-12548:
----------------------------------------------------------

Commit 0616a8f831d63b3c77d80198ff24b64f07b3b849 in impala's branch 
refs/heads/master from Riza Suminto
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=0616a8f83 ]

IMPALA-12548: Fix behavior of AGG_MEM_CORRELATION_FACTOR

AGG_MEM_CORRELATION_FACTOR has a valid value between 0.0 to 1.0. Like
JOIN_SELECTIVITY_CORRELATION_FACTOR option, the correlation factor here
is meant to reflect the correlation coefficient between grouping
columns. A high value of AGG_MEM_CORRELATION_FACTOR should mean a high
correlation between grouping columns.

However, the implementation of this query option behaves the opposite.
1.0 is interpreted as no correlation at all in the code, while <1.0 is
interpreted as somewhat correlated.

This patch fixes the behavior so that the planner lower memory estimate
as AGG_MEM_CORRELATION_FACTOR go higher.

Testing:
- Fix and pass PlannerTest#testAggNodeMaxMemEstimate.
- Add testAggNodeLowMemEstimate and testAggNodeHighMemEstimate.

Change-Id: I6f81db32a1818abc257957f6de942b5c9f36211a
Reviewed-on: http://gerrit.cloudera.org:8080/20684
Reviewed-by: Michael Smith <[email protected]>
Reviewed-by: Kurt Deschler <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>


> Clarify the meaning of AGG_MEM_CORRELATION_FACTOR
> -------------------------------------------------
>
>                 Key: IMPALA-12548
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12548
>             Project: IMPALA
>          Issue Type: Bug
>    Affects Versions: Impala 4.3.0
>            Reporter: Riza Suminto
>            Assignee: Riza Suminto
>            Priority: Major
>             Fix For: Impala 4.4.0
>
>
> AGG_MEM_CORRELATION_FACTOR has valid value between 0.0 to 1.0. This query 
> option name can be misinterpreted as high value means highly correlated 
> columns, but it is actually the opposite in the code implementation. 1.0 is 
> interpreted as no correlation at all in the code, while <1.0 means somewhat 
> correlated. Documentation should be clarified about this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to