[ https://issues.apache.org/jira/browse/KAFKA-4039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15847520#comment-15847520 ]
ASF GitHub Bot commented on KAFKA-4039: --------------------------------------- GitHub user ijuma opened a pull request: https://github.com/apache/kafka/pull/2474 KAFKA-4039: Fix deadlock during shutdown due to log truncation not allowed You can merge this pull request into a Git repository by running: $ git pull https://github.com/ijuma/kafka kafka-4039-deadlock-during-shutdown Alternatively you can review and apply these changes as the patch at: https://github.com/apache/kafka/pull/2474.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2474 ---- commit d902c2bc81c4799ad1eb7bc7ec50e67d04d1c9c2 Author: Maysam Yabandeh <myaban...@dropbox.com> Date: 2016-08-17T18:21:07Z KAFKA-4039: delay invocation of System.exit via FatalExitException commit 548ddec8c8715125f25f3da3263fe4ead39deca2 Author: Maysam Yabandeh <myaban...@dropbox.com> Date: 2016-08-17T22:39:12Z KAFKA-4039 change FatalExitException to Error commit 08992e10eacb661d24506bff1f617f42cce02ffb Author: Maysam Yabandeh <myaban...@dropbox.com> Date: 2016-08-18T00:33:43Z KAFKA-4039 fix the default value of test mode in FatalExitError commit bcf9e9d605fef9bc2cf46380767f3881eeee6313 Author: Maysam Yabandeh <myaban...@dropbox.com> Date: 2016-08-18T14:26:20Z KAFKA-4039 mock System.exit commit 98acc3674a610ac329935c2d3a1bbdd032685d25 Author: Maysam Yabandeh <myaban...@dropbox.com> Date: 2016-08-19T14:38:33Z KAFKA-4039 exit inside an anonymous thread, add test for deadlock commit 39a667eda0817bc42beaaedb962e5797b6f106e1 Author: Ismael Juma <ism...@juma.me.uk> Date: 2017-01-30T14:48:24Z Merge remote-tracking branch 'apache/trunk' into kafka-4039-deadlock-during-shutdown * apache/trunk: (552 commits) MINOR: JavaDoc markup cleanup KAFKA-4679: Remove unstable markers from Connect APIs KAFKA-4450; Add upgrade tests for 0.10.1 and rename TRUNK to DEV_BRANCH to reduce confusion KAFKA-4635; Client Compatibility follow-ups MINOR: update JavaDocs for Kafka Streams DSL helpers KAFKA-4557; Handle Producer.send correctly in expiry callbacks MINOR: Update copyright year in the NOTICE file. KAFKA-4664; Update docs/protocol.html with KIP-97 information KAFKA-4704; Coordinator cache loading fails if groupId is reused for offset storage after group is removed MINOR: Replace for within for each; replace if-elseif with match KAFKA-4644: Improve test coverage of StreamsPartitionAssignor MINOR: Update KTable JavaDoc MINOR: Include more detail in `ConfigDef.parseType` exception message KAFKA-4578; Upgrade notes for 0.10.2.0 MINOR: Streams API JavaDoc improvements MINOR: Add Streams system test for broker backwards compatibility MINOR: Close create topics policy during shutdown and more tests MINOR: Update JavaDoc for DSL PAPI-API KAFKA-4636; Per listener security settings overrides (KIP-103) MINOR: Change logging level for ignored maybeAddMetric from debug to trace ... commit 4d130ef7e990ca8215684adc36865f08caa79557 Author: Ismael Juma <ism...@juma.me.uk> Date: 2017-01-31T21:07:37Z Try a slightly different approach to solving the shutdown deadlock * Open the latch in `ShutdownableThread` before calling `exit` * Introduce `Exit` classes that perform exit/halt and can be changed in tests * Invoke `exit` from the calling thread instead of spawning a new thread * Updated the tests * A few clean-ups commit 1c8032b7419874ef2c56e7cf321d39a3f64e6833 Author: Ismael Juma <ism...@juma.me.uk> Date: 2017-01-31T21:07:51Z A few style clean-ups commit 50b6c571ae51d2c9f5b7288faace349acc156bc8 Author: Ismael Juma <ism...@juma.me.uk> Date: 2017-01-31T21:08:18Z Merge remote-tracking branch 'apache/trunk' into kafka-4039-deadlock-during-shutdown * apache/trunk: MINOR: Logging improvements in consumer internals KAFKA-2955; Add a simple ">" prompt to console producer KAFKA-4613: Follow-up to fix JavaDocs KAFKA-4613: Treat null-key records the same way for joins and aggreations ---- > Exit Strategy: using exceptions instead of inline invocation of exit/halt > ------------------------------------------------------------------------- > > Key: KAFKA-4039 > URL: https://issues.apache.org/jira/browse/KAFKA-4039 > Project: Kafka > Issue Type: Bug > Components: core > Affects Versions: 0.10.0.0 > Reporter: Maysam Yabandeh > Priority: Critical > Labels: reliability > Attachments: deadlock-stack2 > > > The current practice is to directly invoke halt/exit right after the line > that intends to terminate the execution. In the case of System.exit this > could cause deadlocks if the thread invoking System.exit is holding a lock > that will be requested by the shutdown hook threads that will be started by > System.exit. An example is reported by [~aozeritsky] in KAFKA-3924. This > would also makes testing more difficult as it would require mocking static > methods of System and Runtime classes, which is not natively supported in > Java. > One alternative suggested > [here|https://issues.apache.org/jira/browse/KAFKA-3924?focusedCommentId=15420269&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15420269] > would be to throw some dedicated exceptions that will eventually invoke > exit/halt: > {quote} it would be great to move away from executing `System.exit` inline in > favour of throwing an exception (two examples, but maybe we can find better > names: FatalExitException and FatalHaltException) that is caught by some > central code that then does the `System.exit` or `Runtime.getRuntime.halt`. > This helps in a couple of ways: > (1) Avoids issues with locks being held as in this issue > (2) It makes it possible to abstract the action, which is very useful in > tests. At the moment, we can't easily test for these conditions as they cause > the whole test harness to exit. Worse, these conditions are sometimes > triggered in the tests and it's unclear why. > (3) We can have more consistent logging around these actions and possibly > extended logging for tests > {quote} -- This message was sent by Atlassian JIRA (v6.3.15#6346)