[GitHub] [arrow-rs] tustvold commented on pull request #3690: Allow precision loss on multiplying decimal arrays

via GitHub Tue, 07 Mar 2023 02:10:57 -0800


tustvold commented on PR #3690:
URL: https://github.com/apache/arrow-rs/pull/3690#issuecomment-1457899035


   Ok, thank you for your patience with me. Would the following therefore be 
workable and consistent with Spark.
   
   * DataFusion is updated to follow Spark's type inference for arithmetic on 
decimals (if it doesn't already)
   * In particular, multiplication of decimals with precision and scale (p1, 
s1) and (p2, s2) yields (p1 + p2 + 1, s1 + s2)
   * By default, an error is returned if the output precision exceeds the 
maximum precision of the decimal type - as per 
[Spark](https://github.com/apache/spark/blob/4b50a46f1c6ba4ffe2c42f70a512879c28d11dcf/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala#L54)
   * Arithmetic can proceed as normal using the regular unchecked arithmetic 
kernels, as the precision ensures overflow cannot occur
   
   We can then extend this so that if the output precision exceeds the maximum 
precision of the decimal type, and an allowDecimalPrecisionLoss config option 
is true, instead of returning a plan error DataFusion can reduce the precision 
of the result. Looking at the 
[Spark](https://github.com/apache/spark/blob/4b50a46f1c6ba4ffe2c42f70a512879c28d11dcf/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala#L184)
 logic for this, it only does this when it will leave at least 6 digits of 
fractional precision in the result.
   
   To facilitate this we can add a `mul_fixed_point` and 
`mul_fixed_point_checked` to arrow-rs, along with potentially scalar variants, 
that in addition to their regular inputs, also take a scale. I suspect there is 
a more efficient way to do this, but at least initially this could just perform 
an unchecked/checked widening multiplication, divide the result, and then 
perform an unchecked/checked truncation. Initially we could just support i128, 
but I think supporting all integral types eventually would be cool.
   
   I think this would address my concerns with this PR as it would be 
explicitly opt-in, deterministic, and have adequate performance. What do you 
think?
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-rs] tustvold commented on pull request #3690: Allow precision loss on multiplying decimal arrays

Reply via email to