TengHuo commented on PR #10951:
URL: https://github.com/apache/hudi/pull/10951#issuecomment-2036117075

   > We did some tests today and found out why 
`hoodie.datasource.hive_sync.support_timestamp=true`.
   > 
   > When performing an alter schema, and hive sync is performed via spark's 
external catalogue in: 
`org.apache.spark.sql.hudi.command.AlterTableCommand#commitWithSchema`, Spark 
syncs TIMESTAMP types as TIMESTAMP.
   > 
   > ```scala
   > sparkSession.sessionState.catalog
   >       .externalCatalog
   >       .alterTableDataSchema(db, tableName, dataSparkSchema)
   > ```
   > 
   > If this is defaulted to `false`, after altering the schema (via spark-sql) 
of a table containing a `TIMESTAMP` column, the type on HMS will change from 
`LONG` back to `TIMESTAMP` (via spark's external catalogue API).
   > 
   > This will cause subsequent hive-syncs to fail when they try to sync 
`TIMESTAMP` as `LONG`, which is not ideal.
   > 
   > I think it's best that we ensure consistency between Spark, i will submit 
another PR to change the default back to `true`, I will then add a 
documentation there to explain why.
   > 
   > As for the trino/presto error, they will just have to fix it on their end.
   > 
   > # Conclusion
   > The reason for this discrepancy is due to Spark's external catalogue API, 
which syncs `TIMESTAMP` types as `TIMESTAMP` to hive.
   > 
   > Given that Hudi has multiple entrypoints, it make sense that Spark 
introduced this inconsistency.
   > 
   > While I am not sure why hive-sync-tool defaulted the `support_timestamp` 
as `false`, I think it's best we just document this.
   
   In this case, cross engine scenario may be impacted when Hudi Flink user use 
`TIMESTAMP` type, Hive sync in Flink pipeline will sync it as `LONG` by default.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to