zanmato1984 commented on PR #44184:
URL: https://github.com/apache/arrow/pull/44184#issuecomment-2877634929
Hi @khwilson , thanks for the update and the extensive research.
> > Ideally, the scale should be "floating" just as in floating-point
arithmetic, depending on the current running sum (the running sum can be very
large if all data is positive, or very small if the data is centered around
zero). It is then normalized to the original scale at the end. But of course
that makes the algorithm more involved.
>
> Yeah, this is quite complicated and essentially means you need to
implement the floating point addition algorithm.
+1 that this could be unrealistic for us to implement given the complicity.
> > Either is fine to me.
>
> Cool. @zanmato1984 do you have an opinion?
I don't have an obvious preference on this. (And we don't have to jump to
the conclusion too soon, do we?)
I do have an opinion on the ideal case though. As I understand the arrow
compute module, it should be a building block of comprehensive data
systems/applications. Therefore it should remain neutral on the
application-specific behaviors, esp. the case that no one is obviously superior
than others. That is, we probably should supply options for the desired
behaviors, something like `enum PrecisionPolicy { PROMOTE_TO_MAX,
DEMOTE_TO_DOUBLE, }`, and do the computation accordingly (as long as there's
not too much engineering complexity). Of course this is an ultimate goal in the
future and shouldn't be the concern of this PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]