FANNG1 commented on issue #7046: URL: https://github.com/apache/gravitino/issues/7046#issuecomment-2829836436
Here is some investigate for `TIMESTAMP` in different systems. This is mainly caused by the differences about `TIMESTAMP` between Spark and underlying storages like Hive. ### Background - `TIMESTAMP` type in Spark is timestamp with time zone type. And From Spark 3.4, Spark add `TIMESTAMP_NTZ` to represent timestamp without a timezone. - `TIMESTAMP` type in Hive is timestamp without timezone in Hive 2.3, and doesn't support timestamp with time zone type. - Gravitino Spark connector transform Spark `TIMESTAMP` type to Gravitino `TIMESTAMP` type with timezone, and `TIMESTAMP_NTZ` type to Gravitino `TIMESTAMP` type without timezone. - Gravitino Hive catalog, transform Gravitino `TIMESTAMP` type to Hive `TIMESTAMP` type without considering timezone information. and Hive `TIMESTAMP` type is transformed to Gravitino `TIMESTAMP` type without timezone. ### Why the error happens When creating table Spark 3.3 hive connector transform Spark `TIMESTAMP` to Gravitino `TIMESTAMP` with timezone, Hive catalog transform to Hive `TIMESTAMP`. When loading table , the dataype provided by is `TIMESTAMP` without timezone, this type couldn't be transformed to Spark datatype in Spark3.3, so the error happens. ### Why Spark works normally with Hive Spark uses Hive metastore to store date type, but doesn't use the `TIMESTAMP` semantics in HIVE. Spark transform Spark `TIMESTAMP` type to Hive `TIMESTAMP` type when creating table, and vice versa. Hive doesn't support Spark `TIMESTAMP_NTZ` type. ### How to fix in Gravitino? 1. Gravitino Hive catalog only support transforming Gravitino `TIMESTAMP` without timezone type to HIVE `TIMESTAMP` type to keep consistent of the semantics. 2. Gravitino Spark connector do some compatibility work, For Hive table, transform Spark `TIMESTAMP` type to Gravitino `TIMESTAMP` type without timezone, oh, it's dirty work. When loading Gravitino hive tables, transform Gravitino `TIMESTAMP` type without timezone to Spark `TIMESTAMP` type. The solution for Gravitino Spark connector side seems hacky, hope to hear your advice. @mchades @jerryshao @sunxiaojian @yangyuxia -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
