Re: [Spark SQL][Parquet]: Question about support for Parquet TIME data

Sarah Gilmore Mon, 25 Aug 2025 08:23:45 -0700

Hi all,

I opened a sub-task (SPARK-53368) of SPARK-51162 to track future discussions. 
Here's a link[1] to the new JIRA issue. I created a subtask of SPARK-51162 
instead of SPARK-51342 since the latter is already a subtask.

Thanks for taking the time to consider this enhancement!

Best Regards,

Sarah Gilmore

[1] https://issues.apache.org/jira/browse/SPARK-53368

________________________________________
From: serge rielau.com <[email protected]>
Sent: Saturday, August 23, 2025 4:42 PM
To: Max Gekk <[email protected]>
Cc: [email protected] <[email protected]>; 
[email protected] <[email protected]>
Subject: Re: [Spark SQL][Parquet]: Question about support for Parquet TIME data

Wouldn’t isAdjustedToUTC=false imply TIME WITH LOCAL TIMEZONE?
That would be a "different" type.
Personally, I’d much rather see Spark support  TIME/TIMESTAMP WITH TIMEZONE
TIMESTAMP WITH LOCAL TIMEZONE has been providing a rich set of "interesting" 
challenges over the years….

On Aug 23, 2025, at 12:24 PM, Max Gekk <[email protected]> wrote:

Hello Sarah,

> Does the community have any plans to lift the isAdjustedToUTC=false 
> restriction in the future?

So far there are no such plans but we could introduce a SQL config which 
switches the parquet writer to a backward compatible mode for the TIME data 
type, and store it as isAdjustedToUTC=true. I do believe it shouldn't be by 
default because it is incorrect semantically. Could you open a sub-task of 
SPARK-51342 for future discussions, please.

Yours faithfully,
Max Gekk

On Tue, Aug 19, 2025 at 10:51 PM Sarah Gilmore <[email protected]> 
wrote:
Hi all,

My name is Sarah Gilmore, and I am a software developer at MathWorks[1] as well 
as a committer for the apache/arrow project.

I noticed that the Spark ecosystem is introducing a new data type called 
TimeType[2] to represent time of day values in the upcoming 4.1.0 release, and 
I'm very excited to see this work come to fruition! 

However, I also noticed that the accompanying enhancement to Spark's Parquet 
reader only adds the ability to read Parquet TIME data if 
isAdjustedToUTC=false[3]. 

Does the community have any plans to lift the isAdjustedToUTC=false restriction 
in the future?

My question stems from the fact that some Parquet writers generate TIME data 
with isAdjustedToUTC=true to adhere to the Parquet's compatibility 
guidelines[4] with respect to the deprecation of the ConvertedType TIME_MICROS. 
For example, Arrow's Parquet writer sets isAdjustedToUTC=true[5] even though 
Arrow's time types themselves are timezone-agnostic. Consequently, Spark's 
Parquet reader will still be unable to import Parquet files that contain TIME 
data that were generated by Parquet writers that follow the Parquet 
compatibility guidelines - such as the Arrow Parquet writer - even after the 
release of the TimeType Spark datatype. 

For context, the MATLAB parquetwrite function leverages Arrow's Parquet 
writer[6], and many MATLAB users want to read MATLAB-generated Parquet files 
that contain TIME data in Spark.

I appreciate the community's time and consideration on this topic.

Thanks!

Best,

Sarah Gilmore

[1] https://www.mathworks.com/
[2] https://issues.apache.org/jira/browse/SPARK-51342
[3] 
https://github.com/apache/spark/blob/77413d443f23dd7a14194e516a12d2c959a357be/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala#L309
[4] 
https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#deprecated-time-convertedtype
[5] 
https://github.com/apache/arrow/blob/066b2162206825f2d628f97f4113b0403da1f4ec/cpp/src/parquet/arrow/schema.cc#L434
[6] 
https://www.mathworks.com/help/matlab/import_export/datatype-mappings-matlab-parquet.html

---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

Re: [Spark SQL][Parquet]: Question about support for Parquet TIME data

Reply via email to