[
https://issues.apache.org/jira/browse/IMPALA-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17302793#comment-17302793
]
Aman Sinha commented on IMPALA-10564:
-------------------------------------
Capturing some summarized comments from the Gerrit review
(https://gerrit.cloudera.org/c/17168/) and offline discussion with [~wzhou]:
* The ideal long term solution would be to skip the rows that have a decimal
overflow (or other) and optionally log them in a staging area (similar to what
ETL products do) and provide a return status that says 'Inserted N rows,
skipped M rows' (we already display the first part of this message on
success). The motivation for this is that a CTAS or INSERT-SELECT of billion
rows should not be completely aborted due to 1 or few decimal value error.
* However, skipping rows during the write to a columnar format such as Parquet
requires more thought and investigation..it requires rewinding to the previous
row.
* One near term option is to merge the patch changes but make the behavior
configurable. We could introduce a query option use_null_for_decimal_errors
which would be FALSE by default ..so the CTAS would fail. So, users have to
opt-in to allow NULLs to be inserted (making it a conscious choice).
> No error returned when inserting an overflowed value into a decimal column
> --------------------------------------------------------------------------
>
> Key: IMPALA-10564
> URL: https://issues.apache.org/jira/browse/IMPALA-10564
> Project: IMPALA
> Issue Type: Bug
> Components: Backend, Frontend
> Affects Versions: Impala 4.0
> Reporter: Wenzhe Zhou
> Assignee: Wenzhe Zhou
> Priority: Major
>
> When using CTAS statements or INSERT-SELECT statements to insert rows to
> table with decimal columns, Impala insert NULL for overflowed decimal values,
> instead of returning error. This issue happens when the data expression for
> the decimal column in SELECT sub-query consists at least one alias. This
> issue is similar as IMPALA-6340, but IMPALA-6340 only fixed the issue for the
> cases with the data expression for the decimal columns as constants so that
> the overflowed decimal values could be detected by frontend during expression
> analysis. If there is alias (variables) in the data expression for the
> decimal column, Frontend could not evaluate data expression in expression
> analysis phase. Only backend could evaluate the data expression when backend
> execute fragment instances for SELECT sub-queries. The log messages showed
> that the executor detected the decimal overflow error, but somehow it did not
> propagate the error to the coordinator, hence the error was not returned to
> the client.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]