jadami10 commented on code in PR #16842:
URL: https://github.com/apache/pinot/pull/16842#discussion_r2356420797


##########
pinot-spi/src/main/java/org/apache/pinot/spi/executor/ExecutorServiceUtils.java:
##########
@@ -124,7 +124,7 @@ public static void close(ExecutorService executorService) {
   public static void close(ExecutorService executorService, long 
terminationMillis) {
     executorService.shutdown();
     try {
-      if (!executorService.awaitTermination(terminationMillis, 
TimeUnit.SECONDS)) {
+      if (!executorService.awaitTermination(terminationMillis, 
TimeUnit.MILLISECONDS)) {

Review Comment:
   I will be upfront that I haven't dug too deeply myself and relied on AI 
(claude) to look. I threw it at a thread dump + the source code
   ```
   "Thread-41" #100 daemon prio=5 os_prio=0 cpu=0.00ms elapsed=X.XXs tid=0x... 
nid=0x... waiting on condition  [0x...]
        java.lang.Thread.State: TIMED_WAITING (parking)
        at jdk.internal.misc.Unsafe.park(Native Method)
        - parking to wait for <0x...> (a 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at 
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:269)
        at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1763)
        at 
java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1475)
        at 
org.apache.pinot.spi.executor.DecoratorExecutorService.awaitTermination(DecoratorExecutorService.java:117)
        at 
org.apache.pinot.spi.executor.DecoratorExecutorService.awaitTermination(DecoratorExecutorService.java:117)
        at 
org.apache.pinot.spi.executor.DecoratorExecutorService.awaitTermination(DecoratorExecutorService.java:117)
        at 
org.apache.pinot.spi.executor.ExecutorServiceUtils.close(ExecutorServiceUtils.java:127)
    ← BUG: Uses TimeUnit.SECONDS instead of MILLISECONDS
        at 
org.apache.pinot.spi.executor.ExecutorServiceUtils.close(ExecutorServiceUtils.java:113)
        at 
org.apache.pinot.query.runtime.QueryRunner.shutDown(QueryRunner.java:261)
        at 
org.apache.pinot.query.service.server.QueryServer.shutdown(QueryServer.java:174)
        at 
org.apache.pinot.server.worker.WorkerQueryServer.shutDown(WorkerQueryServer.java:89)
        at 
org.apache.pinot.server.starter.ServerInstance.shutDown(ServerInstance.java:268)
        - locked <0x0000040083f3a6e8> (a 
org.apache.pinot.server.starter.ServerInstance)
        at 
org.apache.pinot.server.starter.helix.HelixServerStarter.stop(HelixServerStarter.java:XXX)
        at 
org.apache.pinot.tools.service.PinotServiceManager.stopPinotInstance(PinotServiceManager.java:202)
        at 
org.apache.pinot.tools.service.PinotServiceManager.stopPinotInstanceById(PinotServiceManager.java:283)
        at 
org.apache.pinot.tools.service.PinotServiceManager.stopAll(PinotServiceManager.java:236)
        at 
org.apache.pinot.tools.AbstractBaseCommand.cleanup(AbstractBaseCommand.java:32)
        at 
org.apache.pinot.tools.AbstractBaseCommand$1.run(AbstractBaseCommand.java:31)   
 ← SIGTERM shutdown hook
   ```
   
   this lined up with our previous notes that these servers effectively 
remained stuck for 8-9 hours. This is what Claude proposed as the deadlock
   ```
     1. SIGTERM received → Shutdown hook triggers (AbstractBaseCommand.java:31)
     2. Shutdown cascade → Eventually calls ExecutorServiceUtils.close()
     3. Executor waits 8.33 hours → Due to timeout bug, instead of 30 seconds
     4. Query threads still active → Blocked in ArrayBlockingQueue.offer() 
trying to add results
     5. No consumers reading → Queue consumers shut down as part of shutdown 
process
     6. Deadlock → Shutdown waits for queries, queries can't complete due to 
full/unread queues
   ```
   which is quite reasonable given the order of the logs we see getting stuck 
on shutting down the query service as well as error timeouts we've later seen 
with `{"executionTimeMs":32679109}}`. @priyen-stripe did the initial 
investigation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to