alamb commented on PR #5626:
URL: https://github.com/apache/arrow-rs/pull/5626#issuecomment-2858975922

   Ok, what is happening here is as follows: arrow-rs and arrow-cpp (and 
potentially polars) add a special file metadata field called "ARROW:schema" 
that records the desired Arrow schema. This is described in more detail here:
   -  https://github.com/apache/arrow-rs/pull/7479
   
   In order for the arrow-rs parquet reader to read the data as a duration it 
needs to interpret the contents of that metadata, which is what I think the 
code in this PR does. 
   
   It would be really nice if the arrow-rs parquet **writer** also wrote the 
correct metadata so duration data in parquet that was written could be read 
correctly by arrow and parquet
   
   So therefore I think we should add a test like "round trip" test that writes 
a RecordBatch with a Duration to a parquet file and reads it back, verifying 
that the read data is the same. An example of such a test is 
[here](https://github.com/apache/arrow-rs/blob/11c99a3232761e6d12162f6c09822de821b61c96/parquet/src/arrow/arrow_reader/mod.rs#L1383-L1415)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to