sgilmore10 commented on PR #47316:
URL: https://github.com/apache/arrow/pull/47316#issuecomment-3188948954
Hi @wgtmac,
Thanks for sharing your thoughts on this.
I agree with you that the best case scenario would be for the Apache Spark
community to extend the Spark Parquet reader to support the `Time` type with
`isAdjustedToUTC=true`. However, I was wondering if you could elaborate a bit
more on why the community doesn't feel that extending the Arrow Parquet writer
to support writing Parquet `Time` data with `isAdjustedToUTC` set to `false` is
a good idea.
The decision to *default* to `isAdjustedToUTC=true` makes sense in light of
the Parquet spec's [guidelines on
compatibility](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#deprecated-time-convertedtype)
with respect to the deprecation of `TIME_MILLIS`/`TIME_MICROS`. However, at
the same time, my impression from reading the [discussion on
GH-41476](https://github.com/apache/arrow/issues/41476#issuecomment-2088094499)
is that the Arrow community would have ideally chosen to map Arrow's `Time`
types to `isAdjustedToUTC=false` if compatibility wasn't a concern (because
Arrow's `Time` types are timezone-*unaware*).
Given that the [Parquet
specification](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#time)
allows for writing local `Time` data and that Arrow's time types are
timezone-*unaware*, my personal opinion is that adding the ability to
*explicitly opt-in* to writing `Time` types with `isAdjustedToUTC=false` would
unblock some important interoperability workflows (e.g. Spark <-> Arrow). To be
very clear - what I am suggesting is *NOT* to change the current default
behavior of Arrow's writer (i.e. we would continue writing
`Time(isAdjustedToUTC=true) by default`, and, therefore, this proposed change
would have no impact on backwards compatibility. This would be an *explicit,
opt-in* feature.
Given the complexity of this issue, does anyone feel that it would be
helpful to ask for clarification from the broader Parquet community about this?
It appears
[others](https://lists.apache.org/[email protected]:lte=7y:UTC) have
been confused about the purpose of the `isAdjustedToUTC` parameter in the past.
I really appreciate hearing everyone's thoughts on this. This is definitely
a nuanced issue, and I am comfortable with whatever direction the community
collectively feels is most appropriate. However, in my personal opinion, this
would a worthwhile change.
Thanks!
Best,
Sarah
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]