tustvold commented on code in PR #6257:
URL: https://github.com/apache/arrow-rs/pull/6257#discussion_r1863813242
##########
parquet/src/file/statistics.rs:
##########
@@ -416,9 +429,20 @@ impl Statistics {
/// Returns number of null values for the column, if known.
/// Note that this includes all nulls when column is part of the complex
type.
///
- /// Note this API returns Some(0) even if the null count was not present
- /// in the statistics.
- /// See <https://github.com/apache/arrow-rs/pull/6216/files>
+ /// Note: Versions of this library prior to `53.0.0` returned 0 if the
null count was
+ /// not available. This method returns `None` in that case.
+ ///
+ /// Also, versions of this library prior to `53.0.0` did not store the
null count in the
+ /// statistics if the null count was `0`.
+ ///
+ /// To preserve the prior behavior and read null counts properly from
older files
+ /// you should default to zero:
Review Comment:
Perhaps we should make it clearer that this behaviour is actually incorrect,
it will claim a null count of 0, when it actually isn't known
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]