Paul Rogers created IMPALA-7865:
-----------------------------------

             Summary: Repeated type widening of arithmetic expressions
                 Key: IMPALA-7865
                 URL: https://issues.apache.org/jira/browse/IMPALA-7865
             Project: IMPALA
          Issue Type: Improvement
          Components: Frontend
    Affects Versions: Impala 3.0
            Reporter: Paul Rogers
            Assignee: Paul Rogers


An issue related to IMPALA-7855 occurs in {{ExprRewriterTest.TestToSql()}} in 
the CTAS test. (This test will be made into a separate method, 
{{TestCTASToSql()}}). When run with the "integrated rewrite" feature enabled, 
we get into this odd situation:

 * Analyze the {{CreateTableAsSelect}} statement. Create a temporary copy of 
the associated {{SELECT}} statement.
 * Rewrite the {{SELECT}} statement from {{SELECT 1 + 1}} (both {{TINYINT}}, 
with {{SMALLINT} for the {{+}} operation) to {{SELECT 2}} (as type {{TINYINT}}.)
 * After constant folding, the rule checks the original type of the expression 
({{SMALLINT}}) and casts the result ({{TINYINT}}) to the original type 
({{SMALLINT}}) using an implicit cast.
 * Perform column substitutions, reset and reanalyze. This process discards 
implicit casts. Because the value is 2, it takes the type TINYINT.
 * Create the base table expressions using the newly rewritten value 
({{TINYINT}}) though the result expression is still {{SMALLINT}}.
 * Use the base expressions from the above (type as {{TINY}}) to declare the 
target table column.
 * Now, try to map the result expression {{SMALLINT}} into the newly created 
table column {{TINYINT}}. Fails with a overflow error.

While IMPALA-7855 describes how types are widened unnecessarily due to a single 
expression, the problem here occurs over time, due to repeated analysis of the 
same numeric expression:

* The analyzer implements a set of type propagation rules that generates a 
resulting type for arithmetic expressions that is wider than the types of the 
arguments. For example for {{tinyint_col + 1{{, {{tinyint_col}} and {{1}} are 
{{TINYINT}}, but the result of the expression is promoted to {{SMALLINT}}.
* The planner then sets the type of the constant (1 here) to {{SMALLINT}}.
* Repeat the process on the next cycle. {{tinyint_col}} is {{TINYINT}}, {{1}} 
is {{SMALLINT}}. Now the result of the expression is {{INT}} and {{1}} is 
retyped to be {{INT}}.
* Repeat again and the expression (and constant) are promoted to {{BIGINT}}.
    
Meanwhile, analysis has taken a clone of the expression with the old types. As 
a result, the types of columns in the result list for a SELECT statement can 
differ from the same columns recorded in the SELECT list.

 * After the above, the base table expression for a {{SELECT}} statement has 
one schema ({{TINYINT}}), the result expression has another ({{SMALLINT}}).

While the inconsistency in types may seem a minor issue, it does lead to 
analysis failures and does need to be addressed.

Perhaps two fixes are needed:

 * When rewriting a numeric literal in the constant folding rule, apply the 
rules from {{NumericLiteral}} to override the type guessed by the constant 
evaluation.
 * Modify the {{substituteImpl}} method to a) don't reset numeric literals, or, 
more generally, b) don't reset expressions that did not change (or their 
children did not change.)

Longer term, the implicit cast mechanism is overly fragile: we add it then 
discard it, resulting in subtle type inconsistencies.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to