jhorstmann commented on code in PR #9619:
URL: https://github.com/apache/arrow-rs/pull/9619#discussion_r3008791709


##########
parquet/src/column/writer/encoder.rs:
##########
@@ -325,24 +341,35 @@ impl<T: DataType> ColumnValueEncoder for 
ColumnValueEncoderImpl<T> {
     }
 }
 
+// Get min and max values for all values in `iter`.
+//
+// For floating point we need to compare NaN values until we encounter a 
non-NaN
+// value which then becomes the new min/max. After this, only non-NaN values 
are
+// evaluated. If all values are NaN, then the min/max NaNs as determined by
+// IEEE 754 total order are returned.

Review Comment:
   Agree, this method is on the hot path. I had a look at optimizing it, but 
could not get the compiler to generate nice auto-vectorized code for 
nan-handling yet. I think we can try optimizations in a followup, it would be 
more important to get the semantics correct first and make sure there are tests 
for edge cases.
   
   <del>
   In that regard, about this requirement
   
   
   > If all values are NaN, then the min/max NaNs as determined by
   // IEEE 754 total order are returned.
   
   Does the current code correctly distinguish different NaN payloads according 
to their sign and bit patterns?
   
   </del>
   
   (Solved, github was hiding the changes to `compare_greater` in `mod.rs`)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to