jhorstmann commented on pull request #9416:
URL: https://github.com/apache/arrow/pull/9416#issuecomment-774331065


   Some background from my perspective:
   
   I implemented the NaN sorting behaviour initially in [ARROW-9895][1] to be 
consistent with Postgres behaviour and then later implemented the same 
behaviour for the aggregation kernels in [ARROW-10216][2]. At that time I did 
no know about the total order specification. Total order seems a bit more 
"standard" than Postgres in this case, but the only difference is how negative 
NaN is handled. The current state is I think that lexicographical sort and simd 
min/max don't distinguish between negative/positive NaN and consider all NaN to 
be bigger than all other values, while single column sort and scalar min/max 
follow total order.
   
   Considering that it's not possible to distinguish different NaN without 
transmuting and I'm not sure whether operations involving NaN are required to 
keep the sign, this inconsistency is probably not urgent to fix.
   
   Another thing to keep in mind is that NaN and Null are two completely 
separate concepts in both Postgres and the arrow data model and nulls are 
always excluded when evaluating aggregations.
   
   For ignoring NaN values, I think an efficient implementation could be a 
separate kernel that maps NaN to null.
   
    [1]: https://issues.apache.org/jira/browse/ARROW-9895
    [2]: https://issues.apache.org/jira/browse/ARROW-10216


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to