alamb commented on PR #5626: URL: https://github.com/apache/arrow-rs/pull/5626#issuecomment-2858975922
Ok, what is happening here is as follows: arrow-rs and arrow-cpp (and potentially polars) add a special file metadata field called "ARROW:schema" that records the desired Arrow schema. This is described in more detail here: - https://github.com/apache/arrow-rs/pull/7479 In order for the arrow-rs parquet reader to read the data as a duration it needs to interpret the contents of that metadata, which is what I think the code in this PR does. It would be really nice if the arrow-rs parquet **writer** also wrote the correct metadata so duration data in parquet that was written could be read correctly by arrow and parquet So therefore I think we should add a test like "round trip" test that writes a RecordBatch with a Duration to a parquet file and reads it back, verifying that the read data is the same. An example of such a test is [here](https://github.com/apache/arrow-rs/blob/11c99a3232761e6d12162f6c09822de821b61c96/parquet/src/arrow/arrow_reader/mod.rs#L1383-L1415) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org