Hi devs,

just stumbled over statistics setting for string columns with large values… in 
https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
 
<https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java>
 (toParquetStatistics())

Looks like in case min/max for a string column crosses the boundary of 4096 
bytes, then min/max are not written at all.
Is there a reason why null_count is omitted then as well ? Or is it rather a 
bug ?

best
Johannes



Reply via email to