andrei-ionescu edited a comment on issue #1360:
URL: 
https://github.com/apache/arrow-datafusion/issues/1360#issuecomment-979463480


   Here are multiple things to discuss. 
   
   Even though `INT96` is deprecated it is not yet removed from Parquet and 
still used in Spark, Flink and may other frameworks. By default Spark 3 comes 
with the `spark.sql.parquet.outputTimestampType` option set by default to 
`INT96` ([see 
here](https://spark.apache.org/docs/3.0.0/configuration.html#runtime-sql-configuration)).
 There are lots of parquet file created with columns having the `INT96` type 
even though they may contain values that fit into `INT64` only because that is 
the default setting. I would say that it would be useful to have a consistent 
behaviour: support `INT96`, mark it as deprecated and remove it when and if 
parquet will remove it.
   
   Regarding the Spark implementation here is a function that returns the 
nanos: 
https://github.com/apache/spark/blob/HEAD/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L192.
 The nanos precision is not discarded in Spark.
   
   Apache Flink maps the `Timestamp` type to `INT96` too: 
https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/table/formats/parquet/.
 Also, Impala still uses it.
   
   Can you provide the part of the code where the overflow issue happens? I 
would like to understand more.
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to