pacman82 commented on issue #5270:
URL: https://github.com/apache/arrow-rs/issues/5270#issuecomment-1874386130

   Hello @tustvold ,
   
   I do not know if this helps to clear things up, or causes more confusion, 
but here is my attempt to reproduce the issue directly with `odbc2parquet`.
   
   See: 
<https://github.com/pacman82/odbc2parquet/blob/42295d22b4ac442acdb70fea9c18752c830de6e1/tests/integration.rs#L3973>
   
   ```rust
   #[test]
   fn write_statistics_for_text_columns() {
       // Setup table for test
       let table_name = "WriteStatisticsForTextColumns";
       let mut table = TableMssql::new(table_name, &["VARCHAR(10)"]);
       table.insert_rows_as_text(&[["aaa"], ["zzz"]]);
       let query = format!("SELECT a FROM {table_name}");
   
       let command = Command::cargo_bin("odbc2parquet")
           .unwrap()
           .args([
               "query",
               "--connection-string",
               MSSQL,
               "-", // Use `-` to explicitly write to stdout
               &query,
           ])
           .assert()
           .success();
   
       // Then
       let bytes = Bytes::from(command.get_output().stdout.clone());
       let reader = SerializedFileReader::new(bytes).unwrap();
       let stats = 
reader.metadata().row_group(0).column(0).statistics().unwrap();
       assert_eq!("aaa", str::from_utf8(stats.min_bytes()).unwrap());
       assert_eq!("zzz", str::from_utf8(stats.max_bytes()).unwrap());
   }
   ```
   
   The above code executes and the tests passes. This hints that at reading the 
file with the Rust parquet crate the statistics are present. Yet they seem to 
be written differently to what the Python stack seems to expect.
   
   Best, Markus


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to