Hello,
Tahsin and I are trying to use the Apache Parquet file format with Spark SQL, but are running into errors when reading Parquet files that contain TimeType columns. We're wondering whether this is unsupported in Spark SQL due to an architectural limitation, or due to lack of resources? Context: When reading some Parquet files with Spark, we get an error message like the following: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 186.0 failed 4 times, most recent failure: Lost task 0.3 in stage 186.0 (TID 1970, 10.155.249.249, executor 1): java.io.IOException: Could not read or convert schema for file: dbfs:/test/randomdata/sample001.parquet ... Caused by: org.apache.spark.sql.AnalysisException: Illegal Parquet type: INT64 (TIME_MICROS); at org.apache.spark.sql.execution.datasources.parquet.ParquetToSparkSchemaConverter.illegalType$1(ParquetSchemaConverter.scala:106) This only seems to occur with Parquet files that have a column with the "TimeType" (or the deprecated "TIME_MILLIS"/"TIME_MICROS") types in the Parquet file. After digging into this a bit, we think that the error message is coming from "ParquetSchemaConverter.scala" here: link<https://github.com/apache/spark/blob/11d3a744e20fe403dd76e18d57963b6090a7c581/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala#L151>. <https://github.com/apache/spark/blob/11d3a744e20fe403dd76e18d57963b6090a7c581/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala#L140> This seems to imply that the Spark SQL engine does not support reading Parquet files with TimeType columns. We are wondering if anyone on the mailing list could shed some more light on this: are there are architectural/datatype limitations in Spark that are resulting in this error, or is TimeType support for Parquet files something that hasn't been implemented yet due to lack of resources/interest? Thanks, Rylan