raulcd opened a new pull request, #47466:
URL: https://github.com/apache/arrow/pull/47466

   ### Rationale for this change
   
   Currently we drop all statistics if `SortOrder` is `UNKNOWN`. This seems too 
broad and there are some statistics, like `null_count` that could be maintained.
   
   
https://github.com/apache/arrow/blob/6f6138b7eedece0841b04f4e235e3bedf5a3ee29/cpp/src/parquet/metadata.cc#L330-L335
   
   Clearing `min/max` but allowing to keep `null_count` when `SortOrder` is 
`UNKNOWN` would allow users to use them.
   
   ### What changes are included in this PR?
   
   Maintain Statistics when reading them if `SortOrder::UNKNOWK` but clear 
min/max
   
   ### Are these changes tested?
   
   Yes, there is a file on parquet-testing which allows us to validate this 
exact scenario.
   
   ### Are there any user-facing changes?
   
   No changes to APIs, users will be able to read statistics on this case.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to