andygrove commented on PR #4749:
URL: 
https://github.com/apache/datafusion-comet/pull/4749#issuecomment-4849049961

   Thanks for the updates. The divisor guard plus the two new tests address all 
three of my earlier comments.
   
   On the precise-decimal vs double-average question I raised, I dug into it 
further and I am now satisfied there is no real divergence to worry about. I 
built the worst case on purpose: 32 rows in one partition (a power-of-two 
count, so `sum / count` is exact in `double`), thirty-one `0.0000` and one 
`0.0003`, where the average `3 / 32 / 10^4 = 0.000009375` lands exactly on the 
HALF_UP boundary at the result scale. Both Spark and Comet return `0.00000938`.
   
   The reason is Spark's cast from the `double` result back to decimal. It goes 
through `Decimal.apply(BigDecimal.valueOf(d))` in `Cast.scala`, which uses 
`Double.toString`, the shortest round-trip string. That recovers `0.000009375` 
rather than the exact binary value `0.00000937499...`, so it rounds up and 
matches Comet's exact `avg_decimal`. This is exactly why Spark gates the 
rewrite on `prec + 4 <= 15` (`MAX_DOUBLE_DIGITS`): within 15 significant digits 
the `double` round-trips to a unique short decimal equal to the exactly-rounded 
value. To be thorough I also ran a large targeted search over rounding-boundary 
inputs across a range of scales and counts and found zero mismatches.
   
   So the approach of restoring the decimal child is sound within the range the 
rewrite applies to, and the fuzz test is good defensive coverage on top of 
that. No further concerns from me.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to