[
https://issues.apache.org/jira/browse/HIVE-14705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15465410#comment-15465410
]
Sahil Takiar commented on HIVE-14705:
-------------------------------------
Details about the fix and the bug are below:
*Background:*
Queries:
Query 1: {{select * from (select a-1 as a from test where a=7) z}}
Query 2: {{select a-1 as a from test where a=7}}
Query 3: {{select * from (select a-1 as a1 from test where a=7) z}}
Constant Propagation:
* Constant Propagation in MySQL:
https://dev.mysql.com/doc/internals/en/optimizer-constant-propagation.html
* Expression Folding in MySQL:
https://dev.mysql.com/doc/internals/en/optimizer-folding-constants.html
* Constants can only be propagated from a parent to a child, if an operator has
no constants inside it and is passed no constants, then the child operator will
see no constants
Code:
* The bug + fix is encapsulated within the Constant Propagate Rule in the RBO -
specifically the {{ConstantPropagateProcCtx}} and
{{ConstantPropagateProcFactory}} classes
* The {{ConstantPropagate}} rule walks through the operator tree and invokes
the corresponding method in the {{ConstantPropagateProcFactory}} class for each
operator
** For example, if the walker hits a {{FilterOperator}} it invokes the
{{ConstantPropagateFilterProc.process}} method - this method is responsible for
doing any constant propagation for the given operator
* Each invocation of the {{process}} method is passed in a shared context
called {{ConstantPropagateProcCtx}} which contains a map called
{{opToConstantExprs}}; this map is important because it tracks a column to
constants mapping; it is updated as constants are propagated
* {{ConstantPropagateProcFactory.propagate}} propagates constants inside
assingment operators, only {{=}} and {{is null}} are supported
* {{ConstantPropagateProcFactory.foldExprFull}} folds expressions, this
essentially evaluates any deterministic UDF operator whose parameters are
constants
* {{ConstantPropagateProcFactory.foldOperator}} looks through the list of
propagated constants and tries to replace any columns with their constant
equivalent
*Stepping Through Query 1:*
* The {{TableScanOperator}} is processed first, but no constant propagation
occurs here
* The {{FilterOperator}} will be "folded" via the {{propagate}} method;
basically, this means that the expression {{a = 7}} is added to the map
{{opToConstantExprs}}
* The {{SelectOperator}} from the sub-query is processed next
** {{ConstantPropagateProcCtx.getPropagatedConstants}} is invoked; this method
is responsible for getting all the constants that should be propagated from the
parent operators to the current operator, in this case it fetches all the
constants from the {{FilterOperator}}
** {{foldOperator}} is invoked; this method will take the constants from the
previous step and replace any columns with there new constant values, so in
this case {{a}} is replaced with a value of 7
** {{foldExprFull}} is invoked; this method will take the {{a - 1}} clause in
the select statement and fold it to a value of 6, it can do this because it
knows that {{a}} is now a constant with a value of 7
*The Bug:*
* The bug occurs after expression folding is done in the
{{ConstantPropagateSelectProc.process}} method, the code doesn't update the
{{opToConstantExprs}} map with the new value of {{a}} (it should update it to
6, but it doesn't so the value remains 7)
* When the walker hits the next {{SelectOperator}} (the one in the outer
query), it propagates the value of {{z.a}} as 7, rather than 6
*Why Query 2 Succeeds:*
* The same bug occurs in query 2, but it has no impact because there is only
one select operator
* After {{foldExprFull}} completes, the new column value {{6}} is returned, the
operator is updated so that the schema reflects the update, but the map
{{opToConstantExprs}} is not updated; since this the last relevant operator
that is walked by the Constant Propagate Rule it doesn't matter if the map is
up to date or not
*Why Query 3 Succeeds:*
* Query 3 works due to an unrelated bug inside the
{{ConstantPropagateProcCtx.resolve}} method
* The bug causes the {{ConstantPropagateProcCtx.getPropagatedConstants}} to
return an empty list when processing the {{SelectOperator}} in the sub-query
* Since the list returned is empty, the {{opToConstantExprs}} map has no
entries for the {{SelectOperator}}, so a failure to update the
{{opToConstantExprs}} map doesn't cause any issues
* This bug causes constant propagation to not occur from the inner
{{SelectOperator}} to the outer {{SelectOperator}}
** Hive is evaluatoing the inner query first and then selecting all of its
results
** It should realize that the {{select *}} clause will always return a value of
6
** This bug will not cause the query to return incorrect results, but it will
have a performance impact
** The bug is fixed by HIVE-13602, but the changes are non-trivial, the
original approach needs to be revised, so for now I am leaving it as is
> Hive outer queries is not picking up the right column from subqueries
> ---------------------------------------------------------------------
>
> Key: HIVE-14705
> URL: https://issues.apache.org/jira/browse/HIVE-14705
> Project: Hive
> Issue Type: Bug
> Affects Versions: 1.1.0
> Reporter: Sahil Takiar
> Fix For: 2.0.1
>
>
> The following queries show the bug:
> *Setup:*
> {code}
> create table test (a int);
> insert into test values (7);
> {code}
> *Produces Wrong Results:*
> {code}
> select * from (select a-1 as a from test where a=7) z;
> +------+--+
> | z.a |
> +------+--+
> | 7 |
> +------+--+
> {code}
> *Produces Correct Results:*
> {code}
> select * from (select a-1 as a1 from test where a=7) z;
> +-------+--+
> | z.a1 |
> +-------+--+
> | 6 |
> +-------+--+
> {code}
> Note this only happens with subqueries, as the following query returns the
> correct value of 6 {{select a-1 as a from test where a=7}}
> This affects version 1.1.0 but has been fixed in version 2.1.0 by HIVE-13602
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)