[
https://issues.apache.org/jira/browse/ARROW-17601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17601404#comment-17601404
]
Weston Pace commented on ARROW-17601:
-------------------------------------
I think query engines and planners really like to know the output type before
the plan is not executed. Determining the output type based on the input is
not something that is traditionally done (though maybe it is time to break with
tradition :).
That being said, we could perhaps model ourselves after Substrait (which
itself, modeled its rules after Calcite which is maybe based on Spark's rules
which is probably modeled after something else...)
In Substrait, if a limit is hit when determining the output, then precision is
dropped (instead of outright rejection). It is first dropped from the decimal
places. Once the decimal places have been reduced to 6 digits it begins to
drop precision from the non-decimal digits.
So, by Substrait rules, MULTIPLY(DECIMAL<33, 4>, DECIMAL<15, 2>) yields
DECIMAL<49, 6> which truncates to DECIMAL<38, 6>
Here is how it is explained in TSQL
https://docs.microsoft.com/en-us/sql/t-sql/data-types/precision-scale-and-length-transact-sql?view=sql-server-ver16
{quote}
In addition and subtraction operations, we need max(p1 - s1, p2 - s2) places to
store integral part of the decimal number. If there isn't enough space to store
them that is, max(p1 - s1, p2 - s2) < min(38, precision) - scale, the scale is
reduced to provide enough space for integral part. Resulting scale is
MIN(precision, 38) - max(p1 - s1, p2 - s2), so the fractional part might be
rounded to fit into the resulting scale.
In multiplication and division operations, we need precision - scale places to
store the integral part of the result. The scale might be reduced using the
following rules:
The resulting scale is reduced to min(scale, 38 - (precision-scale)) if the
integral part is less than 32, because it can't be greater than 38 -
(precision-scale). Result might be rounded in this case.
The scale won't be changed if it's less than 6 and if the integral part is
greater than 32. In this case, overflow error might be raised if it can't fit
into decimal(38, scale)
The scale will be set to 6 if it's greater than 6 and if the integral part
is greater than 32. In this case, both integral part and scale would be reduced
and resulting type is decimal(38,6). Result might be rounded to 6 decimal
places or the overflow error will be thrown if the integral part can't fit into
32 digits.
{quote}
> [C++] Error when creating Expression on Decimal128 types: precision out of
> range
> --------------------------------------------------------------------------------
>
> Key: ARROW-17601
> URL: https://issues.apache.org/jira/browse/ARROW-17601
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++
> Reporter: Neal Richardson
> Assignee: Yibo Cai
> Priority: Major
>
> Reproducer in R:
> {code}
> library(arrow)
> library(dplyr)
> tab <- Table$create(col1 = 1:4, col2 = 5:8)
> tab <- tab$cast(schema(col1 = decimal128(33, 4), col2 = decimal128(15, 2)))
> tab %>% mutate(col1 * col2)
> # Error: Invalid: Decimal precision out of range [1, 38]: 49
> # /Users/me/arrow/cpp/src/arrow/compute/kernels/scalar_arithmetic.cc:1078
> DecimalType::Make(left_type.id(), precision, scale)
> # /Users/me/arrow/cpp/src/arrow/compute/exec/expression.cc:413
> call.kernel->signature->out_type().Resolve(&kernel_context, types)
> {code}
> We don't have this problem integers and floats (see comment below). For
> consistency with the other arithmetic functions, what I would expect would be
> that we would expand the precision as much as we could within Decimal128–in
> this case, Decimal128(38, 6)–and the compute function would either error _if_
> there is an overflow (in the _checked version) or just overflow in the
> non-checked version. But it wouldn't error on determining the output type.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)