[jira] [Commented] (IMPALA-10564) No error returned when inserting an overflowed value into a decimal column

Wenzhe Zhou (Jira) Sun, 21 Mar 2021 17:24:06 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-10564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17305798#comment-17305798
 ]


Wenzhe Zhou commented on IMPALA-10564:
--------------------------------------

FE evaluates the constant decimal expression, but it does not transfer the 
materialized decimal value to BE so BE has to evaluate the decimal expression 
again in that case. FE generate different plan trees for expression 
with/without alias. The final result is determined by BE. I debug this issue 
and found there are following three calling paths to handle INSERT, 
INSERT-SELECT and CTAS statements.

Case 1: Insertion with constant decimal expression

Sample queries:
# create a table
create table t10 (id int, a decimal(28,10), b decimal(28,10), c decimal(28,10));
# queries with overlfowed decimal values:
insert into table t10 values(2, cast(754964565555555555559154.9565 as decimal 
(28,10)), cast(84658554984 as decimal (28,10)), cast(88856788.8877 as decimal 
(28,10)));
insert into table t10 select 3, cast(cast(654964569154.9565 as decimal 
(28,7))*44658554984*2.111 as decimal (28,10)), cast(84658554984 as decimal 
(28,10)), cast(88856788.8877 as decimal (28,10));
create table d1 stored as parquet as select cast(cast(654964569154.9565 as 
decimal (28,7))*44658554984*2.111 as decimal (28,10));

Result for above queries with current Impala (without my fixing): return error 
"Decimal expression overflowed" to client.

Case 2: Insertion by selecing from another source table

Sample queries:
# create source table, then insert some valid data
create table t10 (id int, a decimal(28,10), b decimal(28,10), c decimal(28,10));
insert into table t10 values(1, cast(654964569154.9565 as decimal (28,10)), 
cast(44658554984 as decimal (28,10)), cast(56788.8877 as decimal (28,10)));
 with overlfowed decimal values:
# queries with overlfowed decimal values:
create table t11 as select id, cast(a*b*c as decimal (28,10)) from t10;
insert into table t11 select id, cast(a*b*c as decimal (28,10)) from t10;

Result for above queries with current Impala: return error "Decimal expression 
overflowed" to client.

Case 3: Insertion one row with aliases

Sample queries:
# queries with overlfowed decimal values:
create table t12 as select 1 as id, cast(a*44658554984*2.111 as decimal 
(28,10)) as d_28 from (SELECT cast(654964569154.9565 as decimal (28,7)) as a) q;
insert into table t3 select 2, cast(a*44658554984*2.111 as decimal (28,10)) as 
d_28 from (SELECT cast(654964569154.9565 as decimal (28,7)) as a) q;

Result for above queries with current Impala: NULLs are inserted into table 
without error.

Case 3 is the only case we can re-produce the customer issue. But the first two 
cases are more common.


I traced down the BE code and found the causes of results for above three 
cases. 
impala::DecimalOperators::ScaleDecimalValue() is function to evaluate decimal 
expression. If it get invalid value like overflowed value, it will save the 
error in RuntimeState object. The error is not directly retuned to the caller. 
Later if RuntimeState::CheckQueryState() is called, it will return error if 
there is one saved error. There are 9 places in BE to call 
RuntimeState::CheckQueryState(), which make the query failed if 
RuntimeState::CheckQueryState() return error. In the above first two cases, 
RuntimeState::CheckQueryState() is called in functions HdfsTableSink::Send(), 
KuduTableSink::Send(), and ExecNode::QueryMaintenance(). This cause BE to make 
query failed and return error to client.

The following loop in function FragmentInstanceState::ExecInternal() is the 
major piece of code to execute a query fragment.

    do {
       Status status;
        row_batch_->Reset();
        {
           SCOPED_TIMER(plan_exec_timer);
            RETURN_IF_ERROR(
            exec_tree_->GetNext(runtime_state_, row_batch_.get(), 
&exec_tree_complete));
         }
         UpdateState(StateEvent::BATCH_PRODUCED);
         if (VLOG_ROW_IS_ON) 
row_batch_->VLogRows("FragmentInstanceState::ExecInternal()");
         COUNTER_ADD(rows_produced_counter_, row_batch_->num_rows());
         RETURN_IF_ERROR(sink_->Send(runtime_state_, row_batch_.get()));
         UpdateState(StateEvent::BATCH_SENT);
      } while (!exec_tree_complete);


For case 1, exec_tree_->GetNext() (UnionNode::GetNext() for constant 
expression) calls down to DecimalOperators::ScaleDecimalValue() to get 
expression value. If there is decimal overflow issue, the error is saved in 
RuntimeState object. When sink_->Send() (HdfsTableSink::Send() or 
KuduTableSink::Send()) is called, it calls RuntimeState::CheckQueryState() and 
returns error if there is a saved error. This cause BE to return error.

For case 2, the expression is evaluated util the column values are going to be 
wrote to table in HdfsTextTableWriter::AppendRows(). When 
RuntimeState::CheckQueryState() is called by sink_->Send() for one row_batch, 
and there is no error saved in RuntimeState before, it return OK so that 
HdfsTableSink::Send() continue to call HdfsTableSink::WriteRowsToPartition(), 
then HdfsTextTableWriter::AppendRows() to write rows. The NULL could be 
inserted to table for the invalid decimal values for the row_batch without 
returning error. But when exec_tree_->GetNext() is called for next row_batch, 
RuntimeState::CheckQueryState() is called in ExecNode::QueryMaintenance(). This 
cause BE to return error. Also if sink_->Send() is called for next row_batch, 
it returns error if there is error saved in RuntimeState when handling last 
row_batch.
In my sample queries, exec_tree_->GetNext() is returned for the first row_batch 
with exec_tree_complete as false even there is only one row in the source 
table. So exec_tree_->GetNext() is called again after wrirting first row patch, 
and hit RuntimeState::CheckQueryState().

For case 3, the calling path is same as case 2, but exec_tree_->GetNext() is 
returned for the first row_batch with exec_tree_complete as true so 
exec_tree_->GetNext() and sink_->Send() will not be called anymore so that BE 
does not hit RuntimeState::CheckQueryState() and return OK to the client even 
there is an error recorded in RuntimeState object. This is reason why the 
current behaviors for case 3 are different from first two cases.
It is not like the intended behavior by design. Instead, it's more like a bug 
that RuntimeState::CheckQueryState() is not called in a corner case.

Based on the analysis, I have two questions:
Q1: Is it necessary to introduce new query option 
"use_null_for_decimal_errors"? 
This is to keep the current behavior in case 3, which should no be used 
frequently. Customer asked us to fix it.
We should set the default value as false for query option 
"use_null_for_decimal_errors" since current Impala return error in most cases.

Q2: We have to support the query option in all cases. My current patch does not 
work for case 2 when "use_null_for_decimal_errors" is set as true.
I don't have a good solution to fix it. To handle next row patch after hitting 
an invalid decimal value, we have to clean the error saved in RuntimeState 
after inserting NULL to the table. But this change is risky. We only save the 
first error in RuntimeState, it's not safe to clean the error since we could 
get more than one errors for one row batch. Another way is not to save error in 
RuntimeState when getting invalid decimal values, but this change the behavior 
of other queries, for example "select cast(a*44658554984*2.111 as decimal 
(28,10)) as d_28 from (SELECT cast(654964569154.9565 as decimal (28,7)) as a) 
q;".

> No error returned when inserting an overflowed value into a decimal column
> --------------------------------------------------------------------------
>
>                 Key: IMPALA-10564
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10564
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend, Frontend
>    Affects Versions: Impala 4.0
>            Reporter: Wenzhe Zhou
>            Assignee: Wenzhe Zhou
>            Priority: Major
>
> When using CTAS statements or INSERT-SELECT statements to insert rows to 
> table with decimal columns, Impala insert NULL for overflowed decimal values, 
> instead of returning error. This issue happens when the data expression for 
> the decimal column in SELECT sub-query consists at least one alias. This 
> issue is similar as IMPALA-6340, but IMPALA-6340 only fixed the issue for the 
> cases with the data expression for the decimal columns as constants so that 
> the overflowed decimal values could be detected by frontend during expression 
> analysis.  If there is alias (variables) in the data expression for the 
> decimal column, Frontend could not evaluate data expression in expression 
> analysis phase. Only backend could evaluate the data expression when backend 
> execute fragment instances for SELECT sub-queries. The log messages showed 
> that the executor detected the decimal overflow error, but somehow it did not 
> propagate the error to the coordinator, hence the error was not returned to 
> the client.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-10564) No error returned when inserting an overflowed value into a decimal column

Reply via email to