westonpace commented on issue #6828: URL: https://github.com/apache/arrow-datafusion/issues/6828#issuecomment-1664188613
I haven't followed the entire discussion but, FWIW, Substrait currently agrees with the `max(6, s1 + p2 + 1)` formulation. [SQL server also uses this formula](https://learn.microsoft.com/en-us/sql/t-sql/data-types/precision-scale-and-length-transact-sql?view=sql-server-ver16). Let me also preface this description with the fact that this topic confuses me all the time so there is a chance I'm way off base here. > Postgres appears to do something similar - https://github.com/postgres/postgres/blob/29cf61ade3f245aa40f427a1d6345287ef77e622/src/interfaces/ecpg/pgtypeslib/numeric.c#L1047 I'm not sure this bears out empirically. Using https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/9658 I get the following result from `CAST(1 AS DECIMAL(3, 0)) / CAST(1000000) AS DECIMAL(7, 0)) = 0.000001000000000000000000` which has a scale of at least 7 which is greater than `0+4`. > I think using a fixed increment of the dividend's scale makes a whole lot more sense than a value computed based on the divisor's precision, which just seems to be a recipe for overflow. I think the idea of involving the divisor's precision is for cases like: 1 / 1000000000 > which just seems to be a recipe for overflow. > This scaling should allow precision loss instead of overflow directly. Agreed. In both SQL server and Substrait there is a **second step** to help with the overflow problem. This second step applies to all decimal arithmetic operations. The rule is basically like this: If, after following the formulas, the resulting precision / scale is out of bounds, then sacrifice scale (e.g. throw away the really insignificant digits on the far right) but keep at least 6 digits of scale. If you can't keep at least 6 digits of scale (e.g. the number requires 33 digits or more) then overflow. The 6 is absolutely arbitrary but it works. Note, that the[ SQL server rule](https://learn.microsoft.com/en-us/sql/t-sql/data-types/precision-scale-and-length-transact-sql?view=sql-server-ver16) is slightly more complex, but more or less the same thing: > In addition and subtraction operations, we need max(p1 - s1, p2 - s2) places to store the integral part of the decimal number. If > there isn't enough space to store them (that is, max(p1 - s1, p2 - s2) < min(38, precision) - scale), the scale is reduced to provide > enough space for the integral part. The resulting scale is min(precision, 38) - max(p1 - s1, p2 - s2), so the fractional part might be > rounded to fit into the resulting scale. > > In multiplication and division operations, we need precision - scale places to store the integral part of the result. The scale might > be reduced using the following rules: > > * The resulting scale is reduced to min(scale, 38 - (precision-scale)) if the integral part is less than 32, because it can't be greater than 38 - (precision-scale). The result might be rounded in this case. > * The scale isn't changed if it's less than 6 and if the integral part is greater than 32. In this case, an overflow error might be raised if it can't fit into decimal(38, scale). > * The scale is set to 6 if it's greater than 6 and if the integral part is greater than 32. In this case, both the integral part and scale would be reduced and resulting type is decimal(38, 6). The result might be rounded to 6 decimal places, or the overflow error is thrown if the integral part can't fit into 32 digits. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
