sgilmore10 opened a new pull request, #47316:
URL: https://github.com/apache/arrow/pull/47316

   ### Rationale for this change
   
   As of today, it's not possible to write Parquet `TIME` data  whose 
`isAdjustedToUTC` parameter is `false`.  Instead, `isAdjustedToUTC` is 
hard-coded to `true` 
[here](https://github.com/apache/arrow/blob/2dd3ccda6437f79aa34641bd3197dd7392ae4aec/cpp/src/parquet/arrow/schema.cc#L431).
 
   
   Unfortunately, some Parquet consumers only support `TIME` data if the 
`isAdjustedToUTC` parameter is `false`, meaning they cannot import Parquet 
`TIME` data generated by our Parquet Writer.  For example, the apache/spark 
Parquet reader only supports Parquet `TIME` columns if [`isAdjustedToUTC=false` 
and 
`units=MICROSECONDS`](https://github.com/apache/spark/blob/554f6b64f1e2b2346499f6d3340a3695244bfc84/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala#L309).
 
   
   Adding support for writing `TIME` data with the `isAdjustedToUTC` set to 
`false` would unblock users who need to write Spark-compatible Parquet data.
   
   ### What changes are included in this PR?
   
   1. Added a `write_time_adjusted_to_utc` as a property to 
`parquet::ArrowWriterProperties`. If `true`, all `TIME` columns have their 
`isAdjustedToUTC` parameters set to `true`. Otherwise, `isAdjustedToUTC` is set 
to `false` for all `TIME` columns. This property is `true` by default. 
   2. Added `enable_write_time_adjusted_to_utc()` and 
`disable_write_time_adjusted_to_utc()` methods to 
`parquet::ArrowWriterProperties::Builder`.
   
   ### Are these changes tested?
   
   Yes. I added test case `ParquetTimeAdjustedToUTC` to test suite 
`TestConvertArrowSchema`.
   
   ### Are there any user-facing changes?
   
   Yes. Users can now write Parquet `TIME` columns whose `isAdjustedToUTC` 
parameter is `false`.
   
   ### NOTE
   
   1. I did not update the PyArrow interface because I am not familiar with 
that code base. I was planning on creating a new GitHub issue to track that 
work separately.
   2. There already exists an open PR (#43268) for addressing this issue. 
However, that PR was last active over a year ago and seems stale.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to