MrAladdin opened a new issue, #11567:
URL: https://github.com/apache/hudi/issues/11567

   **Describe the problem you faced**
   
   A clear and concise description of the problem.
   1. After restarting the program, .hoodie/metadata/record_index is completely 
deleted
   2. Due to the large amount of data already written into the lake, it is 
automatically deleted and rebuilt upon restart. The massive data volume causes 
a large number of task failures during this reconstruction, ultimately leading 
to failure. Regardless of the amount of resources allocated, the final result 
is still a significant number of task failures, causing the entire task to fail.
   3. There is an urgent need to know the cause and how to resolve it.
   
   
   **Environment Description**
   
   * Hudi version :0.14.1
   
   * Spark version :3.4
   
   * Hive version :3.1.2
   
   * Hadoop version :3.1
   
   * Storage (HDFS/S3/GCS..) :hdfs
   
   * Running on Docker? (yes/no) :no
   
   
   **Hudi**
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to