[
https://issues.apache.org/jira/browse/IMPALA-12035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17717113#comment-17717113
]
Daniel Becker edited comment on IMPALA-12035 at 4/27/23 2:23 PM:
-----------------------------------------------------------------
The problem seems to be this:
* in the case when an error is returned, the literal (9999999999999999999999)
can fit into a DECIMAL and therefore is interpreted as such (see
https://github.com/apache/impala/blob/c8aa5796d93510723342055cc70cf8d00abae754/fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java#L344)
* this DECIMAL can't fit in a BIGINT, so the cast fails (see
https://github.com/apache/impala/blob/0a42185d17164af0c855647de25f1bc87f33ee71/be/src/runtime/decimal-value.inline.h#L124)
* in the second case with the very big number, the literal can't even fit in a
DECIMAL, so it is interpreted as a DOUBLE instead; this is not completely
unreasonable as it fits in the range of DOUBLE, although of course there is
some loss of precision
* the DOUBLE can be cast to BIGINT, although the value is incorrect and this
invokes undefined behaviour (see Floating–integral conversions in
[https://en.cppreference.com/w/cpp/language/implicit_conversion]); the
conversion is done here:
[https://github.com/apache/impala/blob/0a42185d17164af0c855647de25f1bc87f33ee71/be/src/exprs/cast-functions-ir.cc#L61]
This can be seen in the explain statements after {{{}set explain_level=2{}}}:
{code:java}
Analyzed query: SELECT CAST(CAST(9999999999999999999999 AS DECIMAL(22,0)) AS
BIGINT)
VS
Analyzed query: SELECT
CAST(CAST(999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999
AS DOUBLE) AS BIGINT)
{code}
Possible solutions:
# Adding checks to integer-floating point conversions in the BE (both
directions).
** Pros:
*** Strictly speaking this would be necessary as these conversions may invoke
undefined behaviour, see the link above. Conversions to and from DecimalValue
do have checks.
** Cons:
*** This could lead to performance regression because of the added checks.
*** Existing use cases could break if we return an error, although in those
cases the results were already incorrect.
# Adding checks only for the integer-floating point conversion of
literals/constant expressions in the FE.
## Performance impact is negligible.
## Undefined behaviour may be invoked in other cases, as before.
was (Author: daniel.becker):
The problem seems to be this:
* in the case when an error is returned, the literal (9999999999999999999999)
can fit into a DECIMAL and therefore is interpreted as such (see
https://github.com/apache/impala/blob/c8aa5796d93510723342055cc70cf8d00abae754/fe/src/main/java/org/apache/impala/analysis/NumericLiteral.java#L344)
* this DECIMAL can't fit in a BIGINT, so the cast fails (see
https://github.com/apache/impala/blob/0a42185d17164af0c855647de25f1bc87f33ee71/be/src/runtime/decimal-value.inline.h#L124)
* in the second case with the very big number, the literal can't even fit in a
DECIMAL, so it is interpreted as a DOUBLE instead; this is not completely
unreasonable as it fits in the range of DOUBLE, although of course there is
some loss of precision
* the DOUBLE can be cast to BIGINT, although the value is incorrect and this
invokes undefined behaviour (see Floating–integral conversions in
[https://en.cppreference.com/w/cpp/language/implicit_conversion);] the
conversion is done here:
[https://github.com/apache/impala/blob/0a42185d17164af0c855647de25f1bc87f33ee71/be/src/exprs/cast-functions-ir.cc#L61]
This can be seen in the explain statements after {{{}set explain_level=2{}}}:
{code:java}
Analyzed query: SELECT CAST(CAST(9999999999999999999999 AS DECIMAL(22,0)) AS
BIGINT)
VS
Analyzed query: SELECT
CAST(CAST(999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999
AS DOUBLE) AS BIGINT)
{code}
Possible solutions:
# Adding checks to integer-floating point conversions in the BE (both
directions).
** Pros:
*** Strictly speaking this would be necessary as these conversions may invoke
undefined behaviour, see the link above. Conversions to and from DecimalValue
do have checks.
** Cons:
*** This could lead to performance regression because of the added checks.
*** Existing use cases could break if we return an error, although in those
cases the results were already incorrect.
# Adding checks only for the integer-floating point conversion of
literals/constant expressions in the FE.
## Performance impact is negligible.
## Undefined behaviour may be invoked in other cases, as before.
> impala-shell accepts very big numbers but fails to store them correctly
> -----------------------------------------------------------------------
>
> Key: IMPALA-12035
> URL: https://issues.apache.org/jira/browse/IMPALA-12035
> Project: IMPALA
> Issue Type: Bug
> Reporter: Bakai Ádám
> Assignee: Daniel Becker
> Priority: Major
>
> I tried to insert rows very big bigints, and it worked as expected with big
> integers (error message, no new row stored), but it didn’t work as expected
> with ridiculously big integers( no error message, stored incorrect value).
> Here are the commands used:
> {code:java}
> drop TABLE my_first_table2;
> CREATE TABLE my_first_table2
> (
> id BIGINT,
> name STRING,
> PRIMARY KEY(id)
> );
> INSERT INTO my_first_table2 VALUES (cast(9 as BIGINT), "sarah");
> -- this works just fine as expected
> INSERT INTO my_first_table2 VALUES (cast(9999999999999999999999 as BIGINT),
> "sarah");
> -- ERROR: UDF ERROR: Decimal expression overflowed which is expected since it
> is over bigint max value (source:
> https://impala.apache.org/docs/build/plain-html/topics/impala_bigint.html#:~:text=Range%3A%20%2D9223372036854775808%20..,9223372036854775807.
> )
> INSERT INTO my_first_table2 VALUES
> (cast(999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999
> as BIGINT), "sarah");
> -- this succeeds and doesn't throw the same error as the previous command
> which is concerning
> select * from my_first_table2;
> -- there are two rows in the table, and the id is incorrect in one of them
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]