[jira] [Commented] (ARROW-17601) [C++] Error when creating Expression on Decimal128 types: precision out of range

Weston Pace (Jira) Wed, 07 Sep 2022 09:52:08 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-17601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17601404#comment-17601404
 ]


Weston Pace commented on ARROW-17601:
-------------------------------------

I think query engines and planners really like to know the output type before 
the plan is not executed.  Determining the output type based on the input is 
not something that is traditionally done (though maybe it is time to break with 
tradition :).

That being said, we could perhaps model ourselves after Substrait (which 
itself, modeled its rules after Calcite which is maybe based on Spark's rules 
which is probably modeled after something else...)

In Substrait, if a limit is hit when determining the output, then precision is 
dropped (instead of outright rejection).  It is first dropped from the decimal 
places.  Once the decimal places have been reduced to 6 digits it begins to 
drop precision from the non-decimal digits.

So, by Substrait rules, MULTIPLY(DECIMAL<33, 4>, DECIMAL<15, 2>) yields 
DECIMAL<49, 6> which truncates to DECIMAL<38, 6>

Here is how it is explained in TSQL 
https://docs.microsoft.com/en-us/sql/t-sql/data-types/precision-scale-and-length-transact-sql?view=sql-server-ver16

{quote}
In addition and subtraction operations, we need max(p1 - s1, p2 - s2) places to 
store integral part of the decimal number. If there isn't enough space to store 
them that is, max(p1 - s1, p2 - s2) < min(38, precision) - scale, the scale is 
reduced to provide enough space for integral part. Resulting scale is 
MIN(precision, 38) - max(p1 - s1, p2 - s2), so the fractional part might be 
rounded to fit into the resulting scale.

In multiplication and division operations, we need precision - scale places to 
store the integral part of the result. The scale might be reduced using the 
following rules:

    The resulting scale is reduced to min(scale, 38 - (precision-scale)) if the 
integral part is less than 32, because it can't be greater than 38 - 
(precision-scale). Result might be rounded in this case.
    The scale won't be changed if it's less than 6 and if the integral part is 
greater than 32. In this case, overflow error might be raised if it can't fit 
into decimal(38, scale)
    The scale will be set to 6 if it's greater than 6 and if the integral part 
is greater than 32. In this case, both integral part and scale would be reduced 
and resulting type is decimal(38,6). Result might be rounded to 6 decimal 
places or the overflow error will be thrown if the integral part can't fit into 
32 digits.
{quote}

> [C++] Error when creating Expression on Decimal128 types: precision out of 
> range
> --------------------------------------------------------------------------------
>
>                 Key: ARROW-17601
>                 URL: https://issues.apache.org/jira/browse/ARROW-17601
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>            Reporter: Neal Richardson
>            Assignee: Yibo Cai
>            Priority: Major
>
> Reproducer in R:
> {code}
> library(arrow)
> library(dplyr)
> tab <- Table$create(col1 = 1:4, col2 = 5:8)
> tab <- tab$cast(schema(col1 = decimal128(33, 4), col2 = decimal128(15, 2)))
> tab %>% mutate(col1 * col2)
> # Error: Invalid: Decimal precision out of range [1, 38]: 49
> # /Users/me/arrow/cpp/src/arrow/compute/kernels/scalar_arithmetic.cc:1078  
> DecimalType::Make(left_type.id(), precision, scale)
> # /Users/me/arrow/cpp/src/arrow/compute/exec/expression.cc:413  
> call.kernel->signature->out_type().Resolve(&kernel_context, types)
> {code}
> We don't have this problem integers and floats (see comment below). For 
> consistency with the other arithmetic functions, what I would expect would be 
> that we would expand the precision as much as we could within Decimal128–in 
> this case, Decimal128(38, 6)–and the compute function would either error _if_ 
> there is an overflow (in the _checked version) or just overflow in the 
> non-checked version. But it wouldn't error on determining the output type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (ARROW-17601) [C++] Error when creating Expression on Decimal128 types: precision out of range

Reply via email to