[GitHub] [arrow-rs] viirya commented on pull request #3690: Allow precision loss on multiplying decimal arrays

via GitHub Tue, 07 Mar 2023 12:02:22 -0800


viirya commented on PR #3690:
URL: https://github.com/apache/arrow-rs/pull/3690#issuecomment-1458771053


   > * DataFusion is updated to follow Spark's type inference for arithmetic on 
decimals (if it doesn't already)
   > * In particular, multiplication of decimals with precision and scale (p1, 
s1) and (p2, s2) yields (p1 + p2 + 1, s1 + s2)
   
   I remember DataFusion already has type coercion rule for decimal type that 
copies the behavior from Spark's one (the posted code link above).
   
   > * By default, an error is returned if the output precision exceeds the 
maximum precision of the decimal type - as per 
[Spark](https://github.com/apache/spark/blob/4b50a46f1c6ba4ffe2c42f70a512879c28d11dcf/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala#L54)
   
   This is just the constraint on how a DecimalType is constructed. Obviously 
it isn't allowed to create a DecimalType over largest precision. It doesn't 
actually affect decimal arithmetic op.
   
   > * Arithmetic can proceed as normal using the regular unchecked arithmetic 
kernels, as the precision ensures overflow cannot occur
   
   So this is not correct actually. Spark has an expression to check overflow 
of decimal arithmetic op result. It checks if the value can fit into required 
precision/scale. If not, getting a null or exception, depending on config.
   
   > We can then extend this so that if the output precision exceeds the 
maximum precision of the decimal type, and an allowDecimalPrecisionLoss config 
option is true, instead of returning a plan error DataFusion can reduce the 
precision of the result. Looking at the 
[Spark](https://github.com/apache/spark/blob/4b50a46f1c6ba4ffe2c42f70a512879c28d11dcf/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala#L184)
 logic for this, it only does this when it will leave at least 6 digits of 
fractional precision in the result.
   
   `adjustPrecisionScale` is just used by the type coercion rule to adjust 
precision/scale for decimal arithmetic op. I think DataFusion should already 
has it as it has copied Spark decimal type coercion rule. In other words, these 
items are basically analysis phase rule which gets the decimal type of 
arithmetic op result. But it doesn't guide how the op actually performs. After 
op, another expression will check overflow. Although Spark updates slightly 
this process in version 3.4, this process is still the same.
   
   > To facilitate this we can add a `mul_fixed_point` and 
`mul_fixed_point_checked` to arrow-rs, along with potentially scalar variants, 
that in addition to their regular inputs, also take a scale. I suspect there is 
a more efficient way to do this, but at least initially this could just perform 
an unchecked/checked widening multiplication, divide the result, and then 
perform an unchecked/checked truncation. Initially we could just support i128, 
but I think supporting all integral types eventually would be cool.
   
   Let me figure it out. So this sounds similar to this PR but takes a scale 
parameter. After multiplication, truncating the result according to the 
required scale?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] viirya commented on pull request #3690: Allow precision loss on multiplying decimal arrays

Reply via email to