jhorstmann commented on pull request #9416:
URL: https://github.com/apache/arrow/pull/9416#issuecomment-774331065
Some background from my perspective:
I implemented the NaN sorting behaviour initially in [ARROW-9895][1] to be
consistent with Postgres behaviour and then later implemented the same
behaviour for the aggregation kernels in [ARROW-10216][2]. At that time I did
no know about the total order specification. Total order seems a bit more
"standard" than Postgres in this case, but the only difference is how negative
NaN is handled. The current state is I think that lexicographical sort and simd
min/max don't distinguish between negative/positive NaN and consider all NaN to
be bigger than all other values, while single column sort and scalar min/max
follow total order.
Considering that it's not possible to distinguish different NaN without
transmuting and I'm not sure whether operations involving NaN are required to
keep the sign, this inconsistency is probably not urgent to fix.
Another thing to keep in mind is that NaN and Null are two completely
separate concepts in both Postgres and the arrow data model and nulls are
always excluded when evaluating aggregations.
For ignoring NaN values, I think an efficient implementation could be a
separate kernel that maps NaN to null.
[1]: https://issues.apache.org/jira/browse/ARROW-9895
[2]: https://issues.apache.org/jira/browse/ARROW-10216
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]