theirix commented on PR #19369: URL: https://github.com/apache/datafusion/pull/19369#issuecomment-3706946616
> > > Seeing all this logic introduced, I'm beginning to question whether there is actual benefit to having a native log implementation 🤔 > > > Perhaps we should just revert to casting it to float and accept the accuracy loss > > > Thoughts @theirix ? > > > > > > Fair enough, the logic becomes more convoluted. > > The original idea was to introduce common decimal operations. Scale-preserving operations like abs, round, gcd, etc., are easy to implement and support. Some other operations with a natural mapping to decimals (like log10, pow10) adjust scales and do not have a natural analogue in the arrow buffer, leading to more complex logic. These operations are typical for data analytics, and applications could benefit from them. So ten-based operations can be calculated precisely, while for the rest and for more complicated operations, of course, it is fine to lose precision using a native float implementation. > > First, we should reuse the arrow's foundational primitives as much as possible. If there is an `OP_checked`, it's better to piggyback on it. A few num traits were recently added to decimals in arrow-buffer, making it easier for us. > > Second, I believe more logic should be isolated in `calculate_binary_decimal_math`, especially for handling different scales, to shift responsibility from UDF implementers (like pow) to middleware. It is in progress, and I'll submit it shortly. > > That makes sense. I guess what we could also do to alleviate this complexity (and ensure less performance impact) would be: > > * At invoke time of function, only use native decimal operations when we have a scalar exponent > > * Otherwise fall back to casting to float > Sounds like a plan. The routing should be better made based on the type signature, rather than at eval time. > > This can be done in followup PRs of course but at least sets a roadmap for us. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
