Ma77Ball commented on code in PR #5647:
URL: https://github.com/apache/texera/pull/5647#discussion_r3407254691


##########
common/workflow-core/src/main/scala/org/apache/texera/amber/core/storage/util/LakeFSStorageClient.scala:
##########
@@ -69,11 +74,25 @@ object LakeFSStorageClient {
   private val branchName: String = "main"
 
   def healthCheck(): Unit = {
-    try {
-      this.healthCheckApi.healthCheck().execute()
-    } catch {
-      case e: Exception =>
-        throw new RuntimeException(s"Failed to connect to lake fs server: 
${e.getMessage}")
+    var attempt = 1
+    while (true) {
+      try {
+        this.healthCheckApi.healthCheck().execute()
+        return
+      } catch {
+        case e: Exception =>
+          if (attempt >= HealthCheckMaxAttempts) {
+            throw new RuntimeException(
+              s"Failed to connect to lake fs server after 
$HealthCheckMaxAttempts attempts: ${e.getMessage}"
+            )

Review Comment:
   Fixed in `c0a1e1a68`:
   - The final-failure `RuntimeException` now passes the original exception as 
its cause, so the LakeFS stack trace is retained.
   - Added a dedicated `case ie: InterruptedException` before the generic 
handler that restores the interrupt status 
(`Thread.currentThread().interrupt()`) and fails fast instead of treating the 
interrupt as a retryable failure.
   
   On the aggregated-error point: I kept the thrown message reporting the last 
error, since every failed attempt is already logged at WARN with its attempt 
number, so per-attempt detail lives in the logs while the exception stays 
concise. Happy to aggregate onto the exception if you would prefer.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to