pacman82 commented on issue #5270: URL: https://github.com/apache/arrow-rs/issues/5270#issuecomment-1874386130
Hello @tustvold , I do not know if this helps to clear things up, or causes more confusion, but here is my attempt to reproduce the issue directly with `odbc2parquet`. See: <https://github.com/pacman82/odbc2parquet/blob/42295d22b4ac442acdb70fea9c18752c830de6e1/tests/integration.rs#L3973> ```rust #[test] fn write_statistics_for_text_columns() { // Setup table for test let table_name = "WriteStatisticsForTextColumns"; let mut table = TableMssql::new(table_name, &["VARCHAR(10)"]); table.insert_rows_as_text(&[["aaa"], ["zzz"]]); let query = format!("SELECT a FROM {table_name}"); let command = Command::cargo_bin("odbc2parquet") .unwrap() .args([ "query", "--connection-string", MSSQL, "-", // Use `-` to explicitly write to stdout &query, ]) .assert() .success(); // Then let bytes = Bytes::from(command.get_output().stdout.clone()); let reader = SerializedFileReader::new(bytes).unwrap(); let stats = reader.metadata().row_group(0).column(0).statistics().unwrap(); assert_eq!("aaa", str::from_utf8(stats.min_bytes()).unwrap()); assert_eq!("zzz", str::from_utf8(stats.max_bytes()).unwrap()); } ``` The above code executes and the tests passes. This hints that at reading the file with the Rust parquet crate the statistics are present. Yet they seem to be written differently to what the Python stack seems to expect. Best, Markus -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
