parisni opened a new issue, #11701: URL: https://github.com/apache/hudi/issues/11701
hudi 0.15.0 --- AFAIK currently when metasync (hive/glue...) fails, hudi commits the data: 1. [hudi commit](https://github.com/apache/hudi/blob/be0068065d6727e6354e601846a4cf4d5e6d4f53/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L984) 2. [metasync action]( https://github.com/apache/hudi/blob/be0068065d6727e6354e601846a4cf4d5e6d4f53/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L1017) 3. [return failure status](https://github.com/apache/hudi/blob/be0068065d6727e6354e601846a4cf4d5e6d4f53/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieSparkSqlWriter.scala#L1021) 4. [eg of insert into cmd](https://github.com/apache/hudi/blob/c80b5596c1de08dc25a096be663241abf5de1b6e/hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/InsertIntoHoodieTableCommand.scala#L104) So right now the user is aware there is a failure. However the hudi table is commited. ### proposal I propose just after a metasync failure, a rollback operation on the current instant is done. ### Scenario Let's say we promoted a type (int -> string). However athena does not support it. The glue metasync should raise an error. 1. If the commit is rollback, then the metastore can stay in a corrupted state (table partially updated) 2. However if it is not rollback, both metastore and hudi table will be in corrupted state: hudi will be promoted as string, but metastore is not able to support it. Then user has to rollback manually. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
