Hi devs, just stumbled over statistics setting for string columns with large values… in https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java <https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java> (toParquetStatistics())
Looks like in case min/max for a string column crosses the boundary of 4096 bytes, then min/max are not written at all. Is there a reason why null_count is omitted then as well ? Or is it rather a bug ? best Johannes
