[GitHub] [arrow-datafusion] westonpace commented on issue #6828: Incorrect Decimal Division Coercion

via GitHub Thu, 03 Aug 2023 08:23:52 -0700


westonpace commented on issue #6828:
URL: 
https://github.com/apache/arrow-datafusion/issues/6828#issuecomment-1664188613


   I haven't followed the entire discussion but, FWIW, Substrait currently 
agrees with the `max(6, s1 + p2 + 1)` formulation.  [SQL server also uses this 
formula](https://learn.microsoft.com/en-us/sql/t-sql/data-types/precision-scale-and-length-transact-sql?view=sql-server-ver16).
  Let me also preface this description with the fact that this topic confuses 
me all the time so there is a chance I'm way off base here.
   
   > Postgres appears to do something similar - 
https://github.com/postgres/postgres/blob/29cf61ade3f245aa40f427a1d6345287ef77e622/src/interfaces/ecpg/pgtypeslib/numeric.c#L1047
   
   I'm not sure this bears out empirically.  Using 
https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/9658 I get the following 
result from `CAST(1 AS DECIMAL(3, 0)) / CAST(1000000) AS DECIMAL(7, 0)) = 
0.000001000000000000000000` which has a scale of at least 7 which is greater 
than `0+4`.
   
   > I think using a fixed increment of the dividend's scale makes a whole lot 
more sense than a value computed based on the divisor's precision, which just 
seems to be a recipe for overflow.
   
   I think the idea of involving the divisor's precision is for cases like:
   
   1 / 1000000000
   
   >  which just seems to be a recipe for overflow.
   > This scaling should allow precision loss instead of overflow directly.
   
   Agreed.  In both SQL server and Substrait there is a **second step** to help 
with the overflow problem.  This second step applies to all decimal arithmetic 
operations.  The rule is basically like this:
   
   If, after following the formulas, the resulting precision / scale is out of 
bounds, then sacrifice scale (e.g. throw away the really insignificant digits 
on the far right) but keep at least 6 digits of scale.  If you can't keep at 
least 6 digits of scale (e.g. the number requires 33 digits or more) then 
overflow.  The 6 is absolutely arbitrary but it works.
   
   Note, that the[ SQL server 
rule](https://learn.microsoft.com/en-us/sql/t-sql/data-types/precision-scale-and-length-transact-sql?view=sql-server-ver16)
 is slightly more complex, but more or less the same thing:
   
   > In addition and subtraction operations, we need max(p1 - s1, p2 - s2) 
places to store the integral part of the decimal number. If > there isn't 
enough space to store them (that is, max(p1 - s1, p2 - s2) < min(38, precision) 
- scale), the scale is reduced to provide > enough space for the integral part. 
The resulting scale is min(precision, 38) - max(p1 - s1, p2 - s2), so the 
fractional part might be > rounded to fit into the resulting scale.
   > 
   > In multiplication and division operations, we need precision - scale 
places to store the integral part of the result. The scale might > be reduced 
using the following rules:
   > 
   >  *  The resulting scale is reduced to min(scale, 38 - (precision-scale)) 
if the integral part is less than 32, because it can't be greater than 38 - 
(precision-scale). The result might be rounded in this case.
   >  *  The scale isn't changed if it's less than 6 and if the integral part 
is greater than 32. In this case, an overflow error might be raised if it can't 
fit into decimal(38, scale).
   >  *  The scale is set to 6 if it's greater than 6 and if the integral part 
is greater than 32. In this case, both the integral part and scale would be 
reduced and resulting type is decimal(38, 6). The result might be rounded to 6 
decimal places, or the overflow error is thrown if the integral part can't fit 
into 32 digits.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] westonpace commented on issue #6828: Incorrect Decimal Division Coercion

Reply via email to