viirya commented on PR #3690: URL: https://github.com/apache/arrow-rs/pull/3690#issuecomment-1458771053
> * DataFusion is updated to follow Spark's type inference for arithmetic on decimals (if it doesn't already) > * In particular, multiplication of decimals with precision and scale (p1, s1) and (p2, s2) yields (p1 + p2 + 1, s1 + s2) I remember DataFusion already has type coercion rule for decimal type that copies the behavior from Spark's one (the posted code link above). > * By default, an error is returned if the output precision exceeds the maximum precision of the decimal type - as per [Spark](https://github.com/apache/spark/blob/4b50a46f1c6ba4ffe2c42f70a512879c28d11dcf/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala#L54) This is just the constraint on how a DecimalType is constructed. Obviously it isn't allowed to create a DecimalType over largest precision. It doesn't actually affect decimal arithmetic op. > * Arithmetic can proceed as normal using the regular unchecked arithmetic kernels, as the precision ensures overflow cannot occur So this is not correct actually. Spark has an expression to check overflow of decimal arithmetic op result. It checks if the value can fit into required precision/scale. If not, getting a null or exception, depending on config. > We can then extend this so that if the output precision exceeds the maximum precision of the decimal type, and an allowDecimalPrecisionLoss config option is true, instead of returning a plan error DataFusion can reduce the precision of the result. Looking at the [Spark](https://github.com/apache/spark/blob/4b50a46f1c6ba4ffe2c42f70a512879c28d11dcf/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala#L184) logic for this, it only does this when it will leave at least 6 digits of fractional precision in the result. `adjustPrecisionScale` is just used by the type coercion rule to adjust precision/scale for decimal arithmetic op. I think DataFusion should already has it as it has copied Spark decimal type coercion rule. In other words, these items are basically analysis phase rule which gets the decimal type of arithmetic op result. But it doesn't guide how the op actually performs. After op, another expression will check overflow. Although Spark updates slightly this process in version 3.4, this process is still the same. > To facilitate this we can add a `mul_fixed_point` and `mul_fixed_point_checked` to arrow-rs, along with potentially scalar variants, that in addition to their regular inputs, also take a scale. I suspect there is a more efficient way to do this, but at least initially this could just perform an unchecked/checked widening multiplication, divide the result, and then perform an unchecked/checked truncation. Initially we could just support i128, but I think supporting all integral types eventually would be cool. Let me figure it out. So this sounds similar to this PR but takes a scale parameter. After multiplication, truncating the result according to the required scale? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
