sgilmore10 commented on PR #47316: URL: https://github.com/apache/arrow/pull/47316#issuecomment-3188948954
Hi @wgtmac, Thanks for sharing your thoughts on this. I agree with you that the best case scenario would be for the Apache Spark community to extend the Spark Parquet reader to support the `Time` type with `isAdjustedToUTC=true`. However, I was wondering if you could elaborate a bit more on why the community doesn't feel that extending the Arrow Parquet writer to support writing Parquet `Time` data with `isAdjustedToUTC` set to `false` is a good idea. The decision to *default* to `isAdjustedToUTC=true` makes sense in light of the Parquet spec's [guidelines on compatibility](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#deprecated-time-convertedtype) with respect to the deprecation of `TIME_MILLIS`/`TIME_MICROS`. However, at the same time, my impression from reading the [discussion on GH-41476](https://github.com/apache/arrow/issues/41476#issuecomment-2088094499) is that the Arrow community would have ideally chosen to map Arrow's `Time` types to `isAdjustedToUTC=false` if compatibility wasn't a concern (because Arrow's `Time` types are timezone-*unaware*). Given that the [Parquet specification](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#time) allows for writing local `Time` data and that Arrow's time types are timezone-*unaware*, my personal opinion is that adding the ability to *explicitly opt-in* to writing `Time` types with `isAdjustedToUTC=false` would unblock some important interoperability workflows (e.g. Spark <-> Arrow). To be very clear - what I am suggesting is *NOT* to change the current default behavior of Arrow's writer (i.e. we would continue writing `Time(isAdjustedToUTC=true) by default`, and, therefore, this proposed change would have no impact on backwards compatibility. This would be an *explicit, opt-in* feature. Given the complexity of this issue, does anyone feel that it would be helpful to ask for clarification from the broader Parquet community about this? It appears [others](https://lists.apache.org/list?d...@parquet.apache.org:lte=7y:UTC) have been confused about the purpose of the `isAdjustedToUTC` parameter in the past. I really appreciate hearing everyone's thoughts on this. This is definitely a nuanced issue, and I am comfortable with whatever direction the community collectively feels is most appropriate. However, in my personal opinion, this would a worthwhile change. Thanks! Best, Sarah -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org