crepererum opened a new pull request #256:
URL: https://github.com/apache/arrow-rs/pull/256


   # Which issue does this PR close?
   Closes #225.
   
    # Rationale for this change
   Fixes NaN handling in parquet statistics. This is in line with the C++ stack:
   
   
https://github.com/apache/arrow/blob/b3e43987c47b2f01b204a2d954f882f7161616ef/cpp/src/parquet/statistics_test.cc#L1000-L1043
   
   # What changes are included in this PR?
   Filter out NaN values from statistics + tests.
   
   # Are there any user-facing changes?
   Yes: formally NaN were included in the stats but at "random" (i.e. when the 
data started with an NaN than the min/max values are NaN, otherwise min/max are 
non-NaN). Now the behavior is: NaN are excluded always. If the only NaN values 
are present, then min/max are unset.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to