FANNG1 commented on issue #7046:
URL: https://github.com/apache/gravitino/issues/7046#issuecomment-2829836436

   Here is some investigate for `TIMESTAMP` in different systems. This is 
mainly caused by the differences about `TIMESTAMP`  between Spark and 
underlying storages like Hive. 
   
   ### Background
   
   - `TIMESTAMP` type in Spark is timestamp  with time zone type. And From 
Spark 3.4, Spark add `TIMESTAMP_NTZ` to represent timestamp without a timezone.
   - `TIMESTAMP` type in Hive is timestamp without timezone in Hive 2.3, and 
doesn't support timestamp with time zone type.
   -  Gravitino Spark connector transform Spark `TIMESTAMP` type to Gravitino 
`TIMESTAMP` type with timezone, and `TIMESTAMP_NTZ` type to Gravitino 
`TIMESTAMP` type without timezone.
   - Gravitino Hive catalog, transform Gravitino `TIMESTAMP` type to Hive 
`TIMESTAMP` type without considering timezone information. and Hive `TIMESTAMP` 
type is transformed to Gravitino `TIMESTAMP` type without timezone.
   
   ### Why the error happens
   
   When creating table Spark 3.3 hive connector transform Spark `TIMESTAMP` to 
Gravitino `TIMESTAMP` with timezone,  Hive catalog transform to Hive 
`TIMESTAMP`.
   When loading table ,  the dataype provided by is `TIMESTAMP`  without 
timezone,  this type couldn't be transformed to Spark datatype in Spark3.3, so 
the error happens.
   
   ### Why Spark works normally with Hive
   
   Spark uses Hive metastore to store date type, but doesn't use the 
`TIMESTAMP` semantics in HIVE.  Spark transform Spark `TIMESTAMP` type to Hive 
`TIMESTAMP` type when creating table, and vice versa.   Hive doesn't support 
Spark `TIMESTAMP_NTZ` type.  
   
   ### How to fix in Gravitino?
   
   1. Gravitino Hive catalog only support transforming  Gravitino `TIMESTAMP` 
without timezone type to  HIVE `TIMESTAMP` type to keep consistent of the 
semantics. 
   2. Gravitino Spark connector do some compatibility work,  For Hive table,  
transform Spark `TIMESTAMP` type to Gravitino `TIMESTAMP` type without 
timezone,  oh, it's dirty work.  When loading Gravitino hive tables, transform 
Gravitino `TIMESTAMP` type without timezone to Spark `TIMESTAMP` type.
   
   The solution for Gravitino Spark connector side seems hacky,  hope to hear 
your advice.  @mchades @jerryshao @sunxiaojian @yangyuxia 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to