Hi all, I opened a sub-task (SPARK-53368) of SPARK-51162 to track future discussions. Here's a link[1] to the new JIRA issue. I created a subtask of SPARK-51162 instead of SPARK-51342 since the latter is already a subtask.
Thanks for taking the time to consider this enhancement! Best Regards, Sarah Gilmore [1] https://issues.apache.org/jira/browse/SPARK-53368 ________________________________________ From: serge rielau.com <se...@rielau.com> Sent: Saturday, August 23, 2025 4:42 PM To: Max Gekk <max.g...@gmail.com> Cc: sgilm...@mathworks.com.invalid <sgilm...@mathworks.com.invalid>; dev@spark.apache.org <dev@spark.apache.org> Subject: Re: [Spark SQL][Parquet]: Question about support for Parquet TIME data Wouldn’t isAdjustedToUTC=false imply TIME WITH LOCAL TIMEZONE? That would be a "different" type. Personally, I’d much rather see Spark support TIME/TIMESTAMP WITH TIMEZONE TIMESTAMP WITH LOCAL TIMEZONE has been providing a rich set of "interesting" challenges over the years…. On Aug 23, 2025, at 12:24 PM, Max Gekk <max.g...@gmail.com> wrote: Hello Sarah, > Does the community have any plans to lift the isAdjustedToUTC=false > restriction in the future? So far there are no such plans but we could introduce a SQL config which switches the parquet writer to a backward compatible mode for the TIME data type, and store it as isAdjustedToUTC=true. I do believe it shouldn't be by default because it is incorrect semantically. Could you open a sub-task of SPARK-51342 for future discussions, please. Yours faithfully, Max Gekk On Tue, Aug 19, 2025 at 10:51 PM Sarah Gilmore <sgilm...@mathworks.com.invalid> wrote: Hi all, My name is Sarah Gilmore, and I am a software developer at MathWorks[1] as well as a committer for the apache/arrow project. I noticed that the Spark ecosystem is introducing a new data type called TimeType[2] to represent time of day values in the upcoming 4.1.0 release, and I'm very excited to see this work come to fruition! However, I also noticed that the accompanying enhancement to Spark's Parquet reader only adds the ability to read Parquet TIME data if isAdjustedToUTC=false[3]. Does the community have any plans to lift the isAdjustedToUTC=false restriction in the future? My question stems from the fact that some Parquet writers generate TIME data with isAdjustedToUTC=true to adhere to the Parquet's compatibility guidelines[4] with respect to the deprecation of the ConvertedType TIME_MICROS. For example, Arrow's Parquet writer sets isAdjustedToUTC=true[5] even though Arrow's time types themselves are timezone-agnostic. Consequently, Spark's Parquet reader will still be unable to import Parquet files that contain TIME data that were generated by Parquet writers that follow the Parquet compatibility guidelines - such as the Arrow Parquet writer - even after the release of the TimeType Spark datatype. For context, the MATLAB parquetwrite function leverages Arrow's Parquet writer[6], and many MATLAB users want to read MATLAB-generated Parquet files that contain TIME data in Spark. I appreciate the community's time and consideration on this topic. Thanks! Best, Sarah Gilmore [1] https://www.mathworks.com/ [2] https://issues.apache.org/jira/browse/SPARK-51342 [3] https://github.com/apache/spark/blob/77413d443f23dd7a14194e516a12d2c959a357be/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala#L309 [4] https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#deprecated-time-convertedtype [5] https://github.com/apache/arrow/blob/066b2162206825f2d628f97f4113b0403da1f4ec/cpp/src/parquet/arrow/schema.cc#L434 [6] https://www.mathworks.com/help/matlab/import_export/datatype-mappings-matlab-parquet.html --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org