[ 
https://issues.apache.org/jira/browse/IMPALA-7855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16690692#comment-16690692
 ] 

Paul Rogers edited comment on IMPALA-7855 at 11/17/18 9:27 PM:
---------------------------------------------------------------

A related issue occurs in {{ExprRewriterTest.TestToSql()}} in the CTAS test. 
(This test will be made into a separate method, {{TestCTASToSql()}}). When run 
with the "integrated rewrite" feature enabled, we get into this odd situation:

 * Analyze the {{CreateTableAsSelect}} statement. Create a temporary copy of 
the associated {{SELECT}} statement.
 * Rewrite the {{SELECT}} statement from {{SELECT 1 + 1}} (both {{TINYINT}}, 
with {{SMALLINT} for the {{+}} operation) to {{SELECT 2}} (as type {{TINYINT}}.)
* After constant folding, the rule checks the original type of the expression 
({{SMALLINT}}) and casts the result ({{TINYINT}}) to the original type 
({{SMALLINT}}) using an implicit cast.
 * Perform column substitutions, reset and reanalyze. This process discards 
implicit casts. Because the value is 2, it takes the type TINYINT.
 * Create the base table expressions using the newly rewritten value 
({{TINYINT}}) though the result expression is still {{SMALLINT}}.
 * Use the base expressions from the above (type as {{TINY}}) to declare the 
target table column.
 * Now, try to map the result expression {{SMALLINT}} into the newly created 
table column {{TINYINT}}. Fails with a overflow error.

The problem here is in one, or both, of two places:

 * The constant folding rule changes the type of the rewritten expression to 
{{TINYINT}} using a (disposable) implicit cast. But, the binary operator {{+}} 
chose {{SMALLINT}} for the same value.
 * The {{Expr.substituteImpl()}} method forces a re-analyze of numeric 
literals, discarding the iplicit cast and losing the wider type created by the 
constant folding rule.

There is another inconsistency which has not (yet) manifested as a test failure 
(probably because we have no suitable tests):

 * After the above, the base table expression for a {{SELECT}} statement has 
one schema ({{TINYINT}}), the result expression has another ({{SMALLINT}}).

While the inconsistency in types may seem a minor issue, it does lead to 
analysis failures and does need to be addressed.

Perhaps two fixes are needed:

 * When rewriting a numeric literal in the constant folding rule, apply the 
rules from {{NumericLiteral}} to override the type guessed by the constant 
evaluation.
 * Modify the {{substituteImpl}} method to a) don't reset numeric literals, or, 
more generally, b) don't reset expressions that did not change (or their 
children did not change.)

Longer term, the implicit cast mechanism is overly fragile: we add it then 
discard it, resulting in subtle type inconsistencies.


was (Author: paul.rogers):
A related issue occurs in {{ExprRewriterTest.TestToSql()}} in the CTAS test. 
(This test will be made into a separate method, {{TestCTASToSql()}}). When run 
with the "integrated rewrite" feature enabled, we get into this odd situation:

 * Analyze the {{CreateTableAsSelect}} statement. Create a temporary copy of 
the associated {{SELECT}} statement.
 * Rewrite the {{SELECT}} statement from {{SELECT 1 + 1}} (both {{TINYINT}}) to 
{{SELECT 2}} (as type {{SMALLINT}}.)
 * Perform column substitutions, reset and reanalyze. Because the value is 2, 
it takes the type TINYINT.
 * Create the base table expressions using the newly rewritten value 
({{TINYINT}}) though the result expression is still {{SMALLINT}}.
 * Use the base expressions from the above (type as {{TINY}}) to declare the 
target table column.
 * Now, try to map the result expression {{SMALLINT}} into the newly created 
table column {{TINYINT}}. Fails with a overflow error.

The problem here is in one, or both, of two places:

 * The constant folding rule changes the type of the rewritten expression to 
{{SMALLINT}} (a conservative guess), though the 
{{NumericLiteral.analyzeImpl()}} method will choose {{TINYINT}} for the same 
value.
 * The {{Expr.substituteImpl()}} method forces a re-analyze of numeric 
literals, losing the wider type created by the constant folding rule.

There is another inconsistency which has not (yet) manifested as a test failure 
(probably because we have no suitable tests):

 * After the above, the base table expression for a {{SELECT}} statement has 
one schema ({{TINYINT}}), the result expression has another ({{SMALLINT}}).

While the inconsistency in types may seem a minor issue, it does lead to 
analysis failures and does need to be addressed.

Perhaps two fixes are needed:

 * When rewriting a numeric literal in the constant folding rule, apply the 
rules from {{NumericLiteral}} to override the type guessed by the constant 
evaluation.
 * Modify the {{substituteImpl}} method to a) don't reset numeric literals, or, 
more generally, b) don't reset expressions that did not change (or their 
children did not change.)

> Excessive type widening leads to unnecessary casts
> --------------------------------------------------
>
>                 Key: IMPALA-7855
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7855
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Frontend
>    Affects Versions: Impala 3.0
>            Reporter: Paul Rogers
>            Priority: Minor
>
> When writing unit tests, created the following query:
> {code:sql}
> with
>   query1 (a, b) as (
>     select 1 + 1 + id, 2 + 3 + int_col from functional.alltypestiny)
> insert into functional.alltypestiny (id, int_col)
>   partition (month = 5, year = 2018)
>   select * from query1
> {code}
> The above fails with the following error:
> {noformat}
> ERROR: AnalysisException: Possible loss of precision for target table 
> 'functional.alltypestiny'.
> Expression 'query1.a' (type: BIGINT) would need to be cast to INT for column 
> 'id'
> {noformat}
> The following does work (for planning, may not actually execute):
> {code:sql}
> with
>   query1 (a, b) as (
>     select
>       cast(1 + 1 + id as int),
>       cast(2 + 3 + int_col as int)
>     from functional.alltypestiny)
> insert into functional.alltypestiny (id, int_col)
>   partition (month = 5, year = 2018)
>   select * from query1
> {code}
> What this says is the the planner selected type {{BIGINT}} for the 
> (rewritten) expression {{2 + id}} where {{id}} is of type {{INT}}. {{BIGINT}} 
> is a conservative guess: adding 2 to the largest {{INT}} could overflow and 
> require a {{BIGINT}}.
> Yet, for such a simple case, such aggressive type promotion may be overly 
> cautious.
> To verify that this is an issue, let's try something similar with Postgres to 
> see if it is as aggressive.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to