Github user dlogothetis commented on a diff in the pull request: https://github.com/apache/giraph/pull/84#discussion_r218507979 --- Diff: giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java --- @@ -1379,9 +1379,15 @@ private boolean barrierOnWorkerList(String finishedWorkerPath, // Wait for a signal or timeout boolean eventTriggered = event.waitMsecs(eventLoopTimeout); + + // If the event was triggered, we reset it. In the next loop run, we will + // read ZK to get the new hosts. + if (eventTriggered) { + event.reset(); + } + long elapsedTimeSinceRegularRunMsec = System.currentTimeMillis() - lastRegularRunTimeMsec; - event.reset(); --- End diff -- It's possible that after `event.waitMsecs` exits (due to timeout) and before `event.reset()` get called, the event get signaled. In this case, in the next loop `event.waitMsec` will timeout again and `logInfoOnlyRun` will continue to be false.
---