gbcoder2020 opened a new issue, #13002:
URL: https://github.com/apache/hudi/issues/13002

   
   **Describe the problem you faced**
   
   In one of the job runs involving upsert data to Hudi CoW table, I observed 
failure corresponding to HoodieCompactionException on metadata folder. See 
screenshot below for Job 117 failing.
   
   
![Image](https://github.com/user-attachments/assets/9b9a90a6-d635-49ee-bf95-c14d25249f35)
   
   
![Image](https://github.com/user-attachments/assets/46f5b02d-0b16-4d35-a8d4-263b1588f4c1)
   
   
![Image](https://github.com/user-attachments/assets/4f426059-8162-4bf2-a4ad-d6146d0c983e)
   
   
   In terms of timeline, on 11th March we initially observed the failure error. 
On a few re-runs performed on between 11th March & 16th March, the same error 
persisted. 
   On a run made on 18th March, the job run succeeded performing the upsert 
successfully. Since then we have not reproduce the issue but need an 
understanding on why it may have happened and in what scenarios can this 
situation re-occur.
   
   Another observation:
   In the S3 hoodie metadata location, I see that a compaction request started 
on Mar 11th, but it got into inflight & finally committed on Mar 16th. (See 
screenshot)
   
   
![Image](https://github.com/user-attachments/assets/9858027a-9b00-4441-973a-ca19c5ce882c)
   
   Questions:
   1. What may have caused this failure to impact our upserts to fail?
   2. What may have caused this failure to recover with no chages made to our 
configuration?
   3. Please explain the behavior of the metadata compaction commit as 
mentioned above.
   4. How can we guard ourselved against such failure scenarios?
   
   **To Reproduce**
   
   Steps to reproduce the behavior: N/A - intermittent behavior. I want to 
understand if this can re-occur, and in what scenarios.
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version : 0.15.0
   
   * Spark version : 3.4.1
   
   * Hive version :
   
   * Hadoop version : 2.7.5
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   
![Image](https://github.com/user-attachments/assets/b4bd82ef-e293-4de0-ada6-c42660e9e562)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to