Today, I printed jstack of my overlord node, then I found there was a KIS supervisor thread which should have been shutdown long ago: ``` "KafkaSupervisor-aweme" #232 daemon prio=5 os_prio=0 tid=0x00007f7804011000 nid=0x30f64 waiting on condition [0x00007f77b97e0000] 271 java.lang.Thread.State: WAITING (parking) 272 at sun.misc.Unsafe.park(Native Method) 273 - parking to wait for <0x00000007b33aab40> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) 274 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) 275 at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) 276 at java.util.concurrent.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:492) 277 at java.util.concurrent.LinkedBlockingDeque.take(LinkedBlockingDeque.java:680) 278 at io.druid.indexing.kafka.supervisor.KafkaSupervisor$2.run(KafkaSupervisor.java:379) 279 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 280 at java.util.concurrent.FutureTask.run(FutureTask.java:266) 281 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 282 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 283 at java.lang.Thread.run(Thread.java:748) ``` Then I checked the code and found when `KafkaSupervisor#stop` being called, [`exec#shutdownNow`](https://github.com/apache/incubator-druid/blob/dabaf4caf8f1a5b62df27bdc7b777c68bde10bc3/extensions-core/kafka-indexing-service/src/main/java/org/apache/druid/indexing/kafka/supervisor/KafkaSupervisor.java#L477) will be called which will make a interrupt for the thread. Then, this interrupt will cause the thread to terminate. But it seems not work sometimes. Here is a quote of `ExecutorService#shutdownNow` from javadoc: ``` There are no guarantees beyond best-effort attempts to stop processing actively executing tasks. For example, typical implementations will cancel via {@link Thread#interrupt}, so any task that fails to respond to interrupts may never terminate. ``` It seems the KIS notice handle task fails to respond to interrupts? So I submit this PR which may help fix this issue.
[ Full content available at: https://github.com/apache/incubator-druid/pull/6337 ] This message was relayed via gitbox.apache.org for [email protected]
