[ 
https://issues.apache.org/jira/browse/KAFKA-2468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712201#comment-14712201
 ] 

Ewen Cheslack-Postava commented on KAFKA-2468:
----------------------------------------------

Does this actually solve the problem? The docs on Runtime.exit say:

bq. If this method is invoked after the virtual machine has begun its shutdown 
sequence then if shutdown hooks are being run this method will block 
indefinitely. If shutdown hooks have already been run and on-exit finalization 
has been enabled then this method halts the virtual machine with the given 
status code if the status is nonzero; otherwise, it blocks indefinitely. 

Since the issue here seems to be that System.exit gets invoked due to an 
exception from KafkaServerStartable.startup, that invokes the shtudown hook, 
which invokes KafkaServerStartable.shutdown, which calls KafkaServer.shutdown, 
which throws an exception and then KafkaServerStartable's exception handler 
invokes System.exit. If we replace one with Runtime.exit, doesn't the above 
comment imply it will also block indefinitely since the first System.exit in 
the scenario above will still invoke Runtime.exit (the first call), then the 
subsequent Runtime.exit call via the shutdown hook's call to 
KafkaServerStartable.shutdown will actually end up waiting on itself (since it 
blocks until shutdown hooks are complete, but it is running in a shutdown hook)?

It seems like a better solution would be to just use a flag. Set it to true 
after startup() returns, then in shutdown(), check it before invoking 
KafkaServer.shutdown() so we don't ever produce the IllegalStateException.

> SIGINT during Kafka server startup can leave server deadlocked
> --------------------------------------------------------------
>
>                 Key: KAFKA-2468
>                 URL: https://issues.apache.org/jira/browse/KAFKA-2468
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Ashish K Singh
>            Assignee: Ashish K Singh
>
> KafkaServer on receiving a SIGINT will try to shutdown and if this happens 
> while the server is starting up, it will get into deadlock.
> Thread dump after deadlock
> {code}
> 2015-08-24 22:03:52
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.55-b03 mixed mode):
> "Attach Listener" daemon prio=5 tid=0x00007fc08e827800 nid=0x5807 waiting on 
> condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "Thread-2" prio=5 tid=0x00007fc08b9de000 nid=0x6b03 waiting for monitor entry 
> [0x000000011ad3a000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>       at java.lang.Shutdown.exit(Shutdown.java:212)
>       - waiting to lock <0x00000007bae86ac0> (a java.lang.Class for 
> java.lang.Shutdown)
>       at java.lang.Runtime.exit(Runtime.java:109)
>       at java.lang.System.exit(System.java:962)
>       at 
> kafka.server.KafkaServerStartable.shutdown(KafkaServerStartable.scala:46)
>       at kafka.Kafka$$anon$1.run(Kafka.scala:65)
> "SIGINT handler" daemon prio=5 tid=0x00007fc08ca51800 nid=0x6503 in 
> Object.wait() [0x000000011aa31000]
>    java.lang.Thread.State: WAITING (on object monitor)
>       at java.lang.Object.wait(Native Method)
>       - waiting on <0x00000007bcb40610> (a kafka.Kafka$$anon$1)
>       at java.lang.Thread.join(Thread.java:1281)
>       - locked <0x00000007bcb40610> (a kafka.Kafka$$anon$1)
>       at java.lang.Thread.join(Thread.java:1355)
>       at 
> java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:106)
>       at 
> java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:46)
>       at java.lang.Shutdown.runHooks(Shutdown.java:123)
>       at java.lang.Shutdown.sequence(Shutdown.java:167)
>       at java.lang.Shutdown.exit(Shutdown.java:212)
>       - locked <0x00000007bae86ac0> (a java.lang.Class for java.lang.Shutdown)
>       at java.lang.Terminator$1.handle(Terminator.java:52)
>       at sun.misc.Signal$1.run(Signal.java:212)
>       at java.lang.Thread.run(Thread.java:745)
> "RMI TCP Accept-0" daemon prio=5 tid=0x00007fc08c164000 nid=0x5c07 runnable 
> [0x0000000119fe8000]
>    java.lang.Thread.State: RUNNABLE
>       at java.net.PlainSocketImpl.socketAccept(Native Method)
>       at 
> java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
>       at java.net.ServerSocket.implAccept(ServerSocket.java:530)
>       at java.net.ServerSocket.accept(ServerSocket.java:498)
>       at 
> sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(LocalRMIServerSocketFactory.java:52)
>       at 
> sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:388)
>       at 
> sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:360)
>       at java.lang.Thread.run(Thread.java:745)
> "Service Thread" daemon prio=5 tid=0x00007fc08d015000 nid=0x5503 runnable 
> [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "C2 CompilerThread1" daemon prio=5 tid=0x00007fc08c82b000 nid=0x5303 waiting 
> on condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "C2 CompilerThread0" daemon prio=5 tid=0x00007fc08c82a000 nid=0x5103 waiting 
> on condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "Signal Dispatcher" daemon prio=5 tid=0x00007fc08c829800 nid=0x4f03 runnable 
> [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "Surrogate Locker Thread (Concurrent GC)" daemon prio=5 
> tid=0x00007fc08d002000 nid=0x400b waiting on condition [0x0000000000000000]
>    java.lang.Thread.State: RUNNABLE
> "Finalizer" daemon prio=5 tid=0x00007fc08d012800 nid=0x3b03 in Object.wait() 
> [0x0000000117ee6000]
>    java.lang.Thread.State: WAITING (on object monitor)
>       at java.lang.Object.wait(Native Method)
>       - waiting on <0x00000007bae05568> (a java.lang.ref.ReferenceQueue$Lock)
>       at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
>       - locked <0x00000007bae05568> (a java.lang.ref.ReferenceQueue$Lock)
>       at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
>       at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:189)
> "Reference Handler" daemon prio=5 tid=0x00007fc08c803000 nid=0x3903 in 
> Object.wait() [0x0000000117de3000]
>    java.lang.Thread.State: WAITING (on object monitor)
>       at java.lang.Object.wait(Native Method)
>       - waiting on <0x00000007bae050f0> (a java.lang.ref.Reference$Lock)
>       at java.lang.Object.wait(Object.java:503)
>       at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
>       - locked <0x00000007bae050f0> (a java.lang.ref.Reference$Lock)
> "main" prio=5 tid=0x00007fc08d000800 nid=0x1303 waiting for monitor entry 
> [0x000000010f353000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>       at java.lang.Shutdown.exit(Shutdown.java:212)
>       - waiting to lock <0x00000007bae86ac0> (a java.lang.Class for 
> java.lang.Shutdown)
>       at java.lang.Runtime.exit(Runtime.java:109)
>       at java.lang.System.exit(System.java:962)
>       at 
> kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:35)
>       at kafka.Kafka$.main(Kafka.scala:69)
>       at kafka.Kafka.main(Kafka.scala)
> "VM Thread" prio=5 tid=0x00007fc08b83b000 nid=0x3703 runnable 
> "Gang worker#0 (Parallel GC Threads)" prio=5 tid=0x00007fc08d00f800 
> nid=0x2103 runnable 
> "Gang worker#1 (Parallel GC Threads)" prio=5 tid=0x00007fc08b80e000 
> nid=0x2303 runnable 
> "Gang worker#2 (Parallel GC Threads)" prio=5 tid=0x00007fc08c801000 
> nid=0x2503 runnable 
> "Gang worker#3 (Parallel GC Threads)" prio=5 tid=0x00007fc08c801800 
> nid=0x2703 runnable 
> "Gang worker#4 (Parallel GC Threads)" prio=5 tid=0x00007fc08c804000 
> nid=0x2903 runnable 
> "Gang worker#5 (Parallel GC Threads)" prio=5 tid=0x00007fc08c804800 
> nid=0x2b03 runnable 
> "Gang worker#6 (Parallel GC Threads)" prio=5 tid=0x00007fc08c805000 
> nid=0x2d03 runnable 
> "Gang worker#7 (Parallel GC Threads)" prio=5 tid=0x00007fc08c806000 
> nid=0x2f03 runnable 
> "Concurrent Mark-Sweep GC Thread" prio=5 tid=0x00007fc08c806800 nid=0x3503 
> runnable 
> "Gang worker#0 (Parallel CMS Threads)" prio=5 tid=0x00007fc08c0bd800 
> nid=0x3103 runnable 
> "Gang worker#1 (Parallel CMS Threads)" prio=5 tid=0x00007fc08c0be800 
> nid=0x3303 runnable 
> "VM Periodic Task Thread" prio=5 tid=0x00007fc08c155000 nid=0x5d03 waiting on 
> condition 
> JNI global references: 239
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to