sgilmore10 commented on PR #47316:
URL: https://github.com/apache/arrow/pull/47316#issuecomment-3188948954

   Hi @wgtmac,
    
   Thanks for sharing your thoughts on this.
   
   I agree with you that the best case scenario would be for the Apache Spark 
community to extend the Spark Parquet reader to support the `Time` type with 
`isAdjustedToUTC=true`. However, I was wondering if you could elaborate a bit 
more on why the community doesn't feel that extending the Arrow Parquet writer 
to support writing Parquet `Time` data with `isAdjustedToUTC` set to `false` is 
a good idea.
    
   The decision to *default* to `isAdjustedToUTC=true` makes sense in light of 
the Parquet spec's [guidelines on 
compatibility](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#deprecated-time-convertedtype)
 with respect to the deprecation of `TIME_MILLIS`/`TIME_MICROS`. However, at 
the same time, my impression from reading the [discussion on 
GH-41476](https://github.com/apache/arrow/issues/41476#issuecomment-2088094499) 
is that the Arrow community would have ideally chosen to map Arrow's `Time` 
types to `isAdjustedToUTC=false` if compatibility wasn't a concern (because 
Arrow's `Time` types are timezone-*unaware*).
   
   Given that the [Parquet 
specification](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#time)
 allows for writing local `Time` data and that Arrow's time types are 
timezone-*unaware*, my personal opinion is that adding the ability to 
*explicitly opt-in* to writing `Time` types with `isAdjustedToUTC=false` would 
unblock some important interoperability workflows (e.g. Spark <-> Arrow). To be 
very clear - what I am suggesting is *NOT* to change the current default 
behavior of Arrow's writer (i.e. we would continue writing 
`Time(isAdjustedToUTC=true) by default`, and, therefore, this proposed change 
would have no impact on backwards compatibility. This would be an *explicit, 
opt-in* feature.
    
   Given the complexity of this issue, does anyone feel that it would be 
helpful to ask for clarification from the broader Parquet community about this? 
It appears 
[others](https://lists.apache.org/list?d...@parquet.apache.org:lte=7y:UTC) have 
been confused about the purpose of the `isAdjustedToUTC` parameter in the past.
    
   I really appreciate hearing everyone's thoughts on this. This is definitely 
a nuanced issue, and I am comfortable with whatever direction the community 
collectively feels is most appropriate. However, in my personal opinion, this 
would a worthwhile change.
   
   Thanks!
   
   Best,
   Sarah


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to