gszadovszky commented on PR #196:
URL: https://github.com/apache/parquet-format/pull/196#issuecomment-1614549513

   @mapleFU, as I've written before that's why we initiated 
[ColumnOrder](https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L863)
 to make the format open to specify orderings. I don't know how the other 
implementations use this already. In the current parquet-mr (since we 
introduced `ColumnOrder`) there is a logic that drops any statistics if the 
defined column order is not known. So we can safely initiate a new one. We can 
say that if the min/max value would contain a NaN, then we would write the new 
`IEEE_754` column order otherwise `TYPE_ORDER`. In this case we can simple skip 
the additional lists for marking all-NaN pages and write the NaN values into 
the statistics instead. The question is how older readers of the other 
implementations would handle an unknown `ColumnOrder`.
   It is an implementation detail that the NaN handling is java is different 
from what IEEE 754 says. Java has only one NaN bitmap. So handling this 
ordering will require additional work. I hope it can be implemented in a 
performant way.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to