Z0ltrix opened a new issue, #2746: URL: https://github.com/apache/drill/issues/2746
Hi everyone, i want to raise a discussion about the current behavior in drill regarding parquet timestamps. Drill uses `INT64` for timestamps and you can switch to `INT96` by setting `store.parquet.reader.int96_as_timestamp` to `true`. With that its not a big problem to work with both types of parquet timestamps, but since that spark uses `INT96` as default, you have to switch this configure in almost all situations, especially when working with new lakehouse architectures like deltalake and iceberg. For spark its clearly documented that they use INT96 in all scenarios: here for reading -> https://spark.apache.org/docs/latest/sql-data-sources-parquet.html > Some Parquet-producing systems, in particular Impala and Hive, store Timestamp into INT96. This flag tells Spark SQL to interpret INT96 data as a timestamp to provide compatibility with these systems. here for writing-> https://spark.apache.org/docs/latest/configuration.html > Sets which Parquet timestamp type to use when Spark writes data to Parquet files. INT96 is a non-standard but commonly used timestamp type in Parquet. TIMESTAMP_MICROS is a standard timestamp type in Parquet, which stores number of microseconds from the Unix epoch. TIMESTAMP_MILLIS is also standard, but with millisecond precision, which means Spark has to truncate the microsecond portion of its timestamp value. Of course we could advise every drill user to write its spark jobs with the configuration `spark.sql.parquet.outputTimestampType` to `TIMESTAMP_MICROS` or `TIMESTAMP_MILLIS` or always toggle this drill configuration after startup, but its still an additional step. @vvysotskyi mentioned that if we would switch this default now, we would have issues with some UDF´s, so i would think it could be a topic for upcomming Drill 2.0.0 as a breaking change. What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org