waitinfuture commented on code in PR #979:
URL:
https://github.com/apache/incubator-celeborn/pull/979#discussion_r1025153073
##########
worker/src/main/scala/org/apache/celeborn/service/deploy/worker/Controller.scala:
##########
@@ -324,6 +324,42 @@ private[deploy] class Controller(
return
}
+ val shuffleCommitTimeout = conf.workerShuffleCommitTimeout
+
+ shuffleCommitInfos.putIfAbsent(shuffleKey, new CommitInfo(null,
CommitInfo.COMMIT_NOTSTARTED))
+ val status = shuffleCommitInfos.get(shuffleKey)
+
+ def waitForCommitFinish(): Unit = {
+ val delta = 100
+ var times = 0
+ while (delta * times < shuffleCommitTimeout) {
+ status.synchronized {
+ if (status.status == CommitInfo.COMMIT_FINISHED) {
+ context.reply(status.response)
+ return
+ }
+ }
+ Thread.sleep(delta)
+ times += 1
+ }
Review Comment:
>
I just added retry logic in client. The design is that worker should always
process handleCommitFiles for a particular shuffleKey ONCE. In case (which I
think is rare) one handleCommitFiles request comes while another is in process,
then the request should wait for complete or timeout. If timeout happens, the
client will trigger requestCommitFiles again if not exceeds maxretries.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]