dsgibbons commented on PR #6313: URL: https://github.com/apache/arrow-rs/pull/6313#issuecomment-2363746738
Thank you for taking the time to look at this @etseidl. I'm still new to the project so I have plenty to learn. From #1938: > If not coerce_types, write as Int64 and embed logical type in arrow schema only. I think I interpreted this as the Parquet `LogicalType`. I hadn't seen that [ref](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#date) before. > I believe the approach called for in https://github.com/apache/arrow-rs/issues/1938 is to write un-annotated INT64, and rely on the encoded arrow schema to know how to interpret the column. So if we can't embed the fact that the field refers to a date in the Parquet `LogicalType`, do we provide additional type information during/after reading to interpret `INT64` columns as `Date64`? Is this what was meant by "embed logical type in arrow schema only" from #1938? I thought that all type information was inferred from the Parquet file. Hence why I removed the `INT32(DATE)->Date64` code, as I didn't think there would be any way to know whether `INT32(DATE)` was coerced or not. Could you please give an example of how a reader would use an arrow schema to correctly interpret the columns? On another note, are you OK with the breaking change introduced by: `arrow_to_parquet_schema(schema: &Schema, coerce_types: bool)`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
