cameronlee314 commented on a change in pull request #1151: [SAMZA-2198] Fix
deadlock in container signal handler
URL: https://github.com/apache/samza/pull/1151#discussion_r323865940
##########
File path:
samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala
##########
@@ -977,18 +977,16 @@ class SamzaContainer(
}
def addShutdownHook {
- val runLoopThread = Thread.currentThread()
shutdownHookThread = new Thread("Samza Container Shutdown Hook Thread") {
override def run() = {
+ val SHUTDOWN_POLL_TIME_MS = 1000
info("Shutting down, will wait up to %s ms." format shutdownMs)
- shutdownRunLoop() //TODO: Pull out shutdown hook to
LocalContainerRunner or SP
- try {
- runLoopThread.join(shutdownMs)
- } catch {
- case e: Throwable => // Ignore to avoid deadlock with
uncaughtExceptionHandler. See SAMZA-1220
- error("Did not shut down within %s ms, exiting." format
shutdownMs, e)
+ shutdownRunLoop() //TODO: Pull out shutdown hook to
LocalContainerRunner or SP
+ val endTime = System.currentTimeMillis() + shutdownMs
+ while (!hasStopped && System.currentTimeMillis() < endTime) {
Review comment:
I think this is going to shut down too early. The reason for the `join` was
to make sure the container fully shut down before exiting the process
(https://issues.apache.org/jira/browse/SAMZA-616). However, `hasStopped()` can
return true before shutdown is complete. The `containerListener` still hasn't
run yet, although that kind of means that the update to `status` (and therefore
the implementation of `hasStopped()`) is kind of misleading...
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services