TheR1sing3un opened a new issue, #12522:
URL: https://github.com/apache/hudi/issues/12522

   Consider the following case if the `heartbeatIntervalInMs = 60 * 1000` and 
`numTolerableHeartbeatMisses = 10`, so `maxAllowableHeartbeatIntervalInMs = 600 
* 1000`
   - 00:00,write application start
   - 00:01, 1st heartbeat send success
   - 00:02, The hdfs network is abnormal or other network causes, send 
heartbeat failed
   - 00:03-00:10, send heartbeat failed everytimes
   - 00:11, heartbeat is expired because `currentTime[00:11] - 
lastHeartbeatTime[00:01] >= maxAllowableHeartbeatIntervalInMs`, according to 
the code logic, `lastHeartbeatTime` will be never updated
   - 10:00, write application has been running for 10h to execute all the logic
   - 10:00, write application start to commit by 
`BaseHoodieWriteClient::commitStats`, but it find that heartbeat has been 
expired, so fail the application by throwing exception
   <img width="1026" alt="image" 
src="https://github.com/user-attachments/assets/f07e8676-d572-40d6-a6f0-91138c7f7928";
 />
   
   So we spent 10 hours running an app that we knew at 00:11 was not going to 
be successful.
   Should we support fail-fast to save some unnecessary resource consumption? 
   
   **_Tips before filing an issue_**
   
   - Have you gone through our [FAQs](https://hudi.apache.org/learn/faq/)?
   
   - Join the mailing list to engage in conversations and get faster support at 
[email protected].
   
   - If you have triaged this as a bug, then file an 
[issue](https://issues.apache.org/jira/projects/HUDI/issues) directly.
   
   **Describe the problem you faced**
   
   A clear and concise description of the problem.
   
   **To Reproduce**
   
   Steps to reproduce the behavior:
   
   1.
   5.
   6.
   7.
   
   **Expected behavior**
   
   A clear and concise description of what you expected to happen.
   
   **Environment Description**
   
   * Hudi version :
   
   * Spark version :
   
   * Hive version :
   
   * Hadoop version :
   
   * Storage (HDFS/S3/GCS..) :
   
   * Running on Docker? (yes/no) :
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to