mdibaiee opened a new issue, #3574:
URL: https://github.com/apache/parquet-java/issues/3574

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   Currently in 
[ParquetMetadataConverter.java](https://github.com/apache/parquet-java/blob/7be05b4702df78ae0c0c6b44adc6b7b7af2d931f/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java),
 there is a guard that prevents the writing of statistics such as min/max AND 
null_count when the stats are larger than the max allowed size under 
truncation. The rationale for this makes sense for omitting min/max, however 
null_count can be written on the file despite the size of its content. See the 
code below:
   
   
https://github.com/apache/parquet-java/blob/7be05b4702df78ae0c0c6b44adc6b7b7af2d931f/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java#L800-L807
   
   The missing `null_count` metadata sometimes causes downstream consumers of 
the parquet files to error. For example in Snowflake we are seeing the 
following kind of error:
   
   ```
   non-nullable column without default has null values according to file 
statistics
   ```
   
   ### Component(s)
   
   Core


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to