andrei-ionescu edited a comment on issue #1360: URL: https://github.com/apache/arrow-datafusion/issues/1360#issuecomment-979463480
Here are multiple things to discuss. Even though `INT96` is deprecated it is not yet removed from Parquet and still used in Spark, Flink and may other frameworks. By default Spark 3 comes with the `spark.sql.parquet.outputTimestampType` option set by default to `INT96` ([see here](https://spark.apache.org/docs/3.0.0/configuration.html#runtime-sql-configuration)). There are lots of parquet file created with columns having the `INT96` type even though they may contain values that fit into `INT64` only because that is the default setting. I would say that it would be useful to have a consistent behaviour: support `INT96`, mark it as deprecated and remove it when and if parquet will remove it. Regarding the Spark implementation here is a function that returns the nanos: https://github.com/apache/spark/blob/HEAD/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L192. The nanos precision is not discarded in Spark. Apache Flink maps the `Timestamp` type to `INT96` too: https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/formats/parquet/. Also, Impala still uses it. Can you provide the part of the code where the overflow issue happens? I would like to understand more. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
