Today, I printed jstack of my overlord node, then I found there was a KIS 
supervisor thread which should have been shutdown long ago:
```
"KafkaSupervisor-aweme" #232 daemon prio=5 os_prio=0 tid=0x00007f7804011000 
nid=0x30f64 waiting on condition [0x00007f77b97e0000]
 271    java.lang.Thread.State: WAITING (parking)
 272     at sun.misc.Unsafe.park(Native Method)
 273     - parking to wait for  <0x00000007b33aab40> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 274     at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 275     at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
 276     at 
java.util.concurrent.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:492)
 277     at 
java.util.concurrent.LinkedBlockingDeque.take(LinkedBlockingDeque.java:680)
 278     at 
io.druid.indexing.kafka.supervisor.KafkaSupervisor$2.run(KafkaSupervisor.java:379)
 279     at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
 280     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
 281     at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 282     at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 283     at java.lang.Thread.run(Thread.java:748)
```
Then I checked the code and found when `KafkaSupervisor#stop` being called, 
[`exec#shutdownNow`](https://github.com/apache/incubator-druid/blob/dabaf4caf8f1a5b62df27bdc7b777c68bde10bc3/extensions-core/kafka-indexing-service/src/main/java/org/apache/druid/indexing/kafka/supervisor/KafkaSupervisor.java#L477)
 will be called which will make a interrupt for the thread. Then, this 
interrupt will cause the thread to terminate. But it seems not work sometimes. 
Here is a quote of `ExecutorService#shutdownNow` from javadoc:
```
There are no guarantees beyond best-effort attempts to stop processing actively 
executing tasks.  For example, typical implementations will cancel via {@link 
Thread#interrupt}, so any task that fails to respond to interrupts may never 
terminate.
```
It seems the KIS notice handle task fails to respond to interrupts? So I submit 
this PR which may help fix this issue.

[ Full content available at: 
https://github.com/apache/incubator-druid/pull/6337 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to