jonashaag opened a new issue, #5270:
URL: https://github.com/apache/arrow-rs/issues/5270

   **Describe the bug**
   <!--
   A clear and concise description of what the bug is.
   -->
   
   I'm using the https://github.com/pacman82/odbc2parquet library that is based 
on this crate.
   
   I observe that statistics like min/max are not written for string columns:
   
   ```
   In [4]: pq.ParquetFile("/tmp/o2p").metadata.row_group(0).column(1)
   Out[4]:
   <pyarrow._parquet.ColumnChunkMetaData object at 0x1033c1080>
     file_offset: 1123
     file_path:
     physical_type: BYTE_ARRAY
     num_values: 100
     path_in_schema: XXX
     is_stats_set: True
     statistics:
       <pyarrow._parquet.Statistics object at 0x103476070>
         has_min_max: False
         min: None
         max: None
         null_count: None
         distinct_count: None
         num_values: 100
         physical_type: BYTE_ARRAY
         logical_type: String
         converted_type (legacy): UTF8
     compression: ZSTD
     encodings: ('PLAIN', 'RLE', 'RLE_DICTIONARY')
     has_dictionary_page: True
     dictionary_page_offset: 394
     data_page_offset: 938
     total_compressed_size: 729
     total_uncompressed_size: 2993
   ```
   
   Relevant code: 
https://github.com/pacman82/odbc2parquet/blob/b571cad6fae1b58e1aab8348f14b32f20d6ec165/src/query/parquet_writer.rs#L47
   
   **To Reproduce**
   <!--
   Steps to reproduce the behavior:
   -->
   
   Use odbc2parquet to download any table that contains a string column
   
   **Expected behavior**
   <!--
   A clear and concise description of what you expected to happen.
   -->
   
   Should have min/max statistics.
   
   **Additional context**
   <!--
   Add any other context about the problem here.
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to