jonashaag opened a new issue, #5270: URL: https://github.com/apache/arrow-rs/issues/5270
**Describe the bug** <!-- A clear and concise description of what the bug is. --> I'm using the https://github.com/pacman82/odbc2parquet library that is based on this crate. I observe that statistics like min/max are not written for string columns: ``` In [4]: pq.ParquetFile("/tmp/o2p").metadata.row_group(0).column(1) Out[4]: <pyarrow._parquet.ColumnChunkMetaData object at 0x1033c1080> file_offset: 1123 file_path: physical_type: BYTE_ARRAY num_values: 100 path_in_schema: XXX is_stats_set: True statistics: <pyarrow._parquet.Statistics object at 0x103476070> has_min_max: False min: None max: None null_count: None distinct_count: None num_values: 100 physical_type: BYTE_ARRAY logical_type: String converted_type (legacy): UTF8 compression: ZSTD encodings: ('PLAIN', 'RLE', 'RLE_DICTIONARY') has_dictionary_page: True dictionary_page_offset: 394 data_page_offset: 938 total_compressed_size: 729 total_uncompressed_size: 2993 ``` Relevant code: https://github.com/pacman82/odbc2parquet/blob/b571cad6fae1b58e1aab8348f14b32f20d6ec165/src/query/parquet_writer.rs#L47 **To Reproduce** <!-- Steps to reproduce the behavior: --> Use odbc2parquet to download any table that contains a string column **Expected behavior** <!-- A clear and concise description of what you expected to happen. --> Should have min/max statistics. **Additional context** <!-- Add any other context about the problem here. --> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
