gbcoder2020 opened a new issue, #13002: URL: https://github.com/apache/hudi/issues/13002
**Describe the problem you faced** In one of the job runs involving upsert data to Hudi CoW table, I observed failure corresponding to HoodieCompactionException on metadata folder. See screenshot below for Job 117 failing.    In terms of timeline, on 11th March we initially observed the failure error. On a few re-runs performed on between 11th March & 16th March, the same error persisted. On a run made on 18th March, the job run succeeded performing the upsert successfully. Since then we have not reproduce the issue but need an understanding on why it may have happened and in what scenarios can this situation re-occur. Another observation: In the S3 hoodie metadata location, I see that a compaction request started on Mar 11th, but it got into inflight & finally committed on Mar 16th. (See screenshot)  Questions: 1. What may have caused this failure to impact our upserts to fail? 2. What may have caused this failure to recover with no chages made to our configuration? 3. Please explain the behavior of the metadata compaction commit as mentioned above. 4. How can we guard ourselved against such failure scenarios? **To Reproduce** Steps to reproduce the behavior: N/A - intermittent behavior. I want to understand if this can re-occur, and in what scenarios. **Expected behavior** A clear and concise description of what you expected to happen. **Environment Description** * Hudi version : 0.15.0 * Spark version : 3.4.1 * Hive version : * Hadoop version : 2.7.5 * Storage (HDFS/S3/GCS..) : S3 * Running on Docker? (yes/no) : no **Additional context** Add any other context about the problem here. **Stacktrace**  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
