danny0405 commented on code in PR #18904:
URL: https://github.com/apache/hudi/pull/18904#discussion_r3345902379


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/client/heartbeat/HoodieHeartbeatClient.java:
##########
@@ -197,20 +222,26 @@ public boolean isHeartbeatExpired(String instantTime) 
throws IOException {
   private void updateHeartbeat(String instantTime) throws 
HoodieHeartbeatException {
     try {
       Long newHeartbeatTime = System.currentTimeMillis();
-      OutputStream outputStream =
-          this.storage.create(
-              new StoragePath(heartbeatFolderPath, instantTime), true);
-      outputStream.close();
+      writeHeartbeatFile(instantTime);
       Heartbeat heartbeat = instantToHeartbeatMap.get(instantTime);
       if (heartbeat.getLastHeartbeatTime() != null && 
isHeartbeatExpired(instantTime)) {
-        log.error("Aborting, missed generating heartbeat within allowable 
interval {} ms", this.maxAllowableHeartbeatIntervalInMs);
-        // Since TimerTask allows only java.lang.Runnable, cannot throw an 
exception and bubble to the caller thread, hence
-        // explicitly interrupting the timer thread.
-        Thread.currentThread().interrupt();
+        // A previous refresh was delayed past the tolerable interval (e.g. 
due to a slow storage write
+        // or driver pressure). Do NOT interrupt the timer thread here: that 
would permanently kill all
+        // future heartbeats for this instant, turning a transient delay into 
a permanent blackout.
+        // Enforcement is done at commit time in 
HeartbeatUtils.abortIfHeartbeatExpired(), which is the
+        // correct and sole enforcement point.
+        log.warn("Missed generating heartbeat for instant {} within allowable 
interval {} ms; continuing to refresh",

Review Comment:
   makes sense somehow, but I do see some risk for correctness: when failed 
writes rollback strategy is configured as LAZY, the async cleaner would 
possibility rollback the current instant by removing some data files(not remove 
the metadata files on timeline yet), and then the write finish to commit, then 
the commit got data loss.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to