Z0ltrix opened a new issue, #2746:
URL: https://github.com/apache/drill/issues/2746

   Hi everyone,
   
   i want to raise a discussion about the current behavior in drill regarding 
parquet timestamps. 
   
   Drill uses `INT64` for timestamps and you can switch to `INT96` by setting 
`store.parquet.reader.int96_as_timestamp` to `true`. With that its not a big 
problem to work with both types of parquet timestamps, but since that spark 
uses `INT96` as default, you have to switch this configure in almost all 
situations, especially when working with new lakehouse architectures like 
deltalake and iceberg.
   
   For spark its clearly documented that they use INT96 in all scenarios:
   
   here  for reading -> 
https://spark.apache.org/docs/latest/sql-data-sources-parquet.html 
   
   >  Some Parquet-producing systems, in particular Impala and Hive, store 
Timestamp into INT96. This flag tells Spark SQL to interpret INT96 data as a 
timestamp to provide compatibility with these systems. 
   
   here for writing-> https://spark.apache.org/docs/latest/configuration.html
   
   > Sets which Parquet timestamp type to use when Spark writes data to Parquet 
files. INT96 is a non-standard but commonly used timestamp type in Parquet. 
TIMESTAMP_MICROS is a standard timestamp type in Parquet, which stores number 
of microseconds from the Unix epoch. TIMESTAMP_MILLIS is also standard, but 
with millisecond precision, which means Spark has to truncate the microsecond 
portion of its timestamp value.
   
   Of course we could advise every drill user to write its spark jobs with the 
configuration `spark.sql.parquet.outputTimestampType` to `TIMESTAMP_MICROS` or 
`TIMESTAMP_MILLIS` or always toggle this drill configuration after startup, but 
its still an additional step. 
   
   @vvysotskyi  mentioned that if we would switch this default now, we would 
have issues with some UDF´s, so i would think it could be a topic for upcomming 
Drill 2.0.0 as a breaking change.
   
   What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@drill.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to