jadami10 commented on code in PR #16842: URL: https://github.com/apache/pinot/pull/16842#discussion_r2356420797
########## pinot-spi/src/main/java/org/apache/pinot/spi/executor/ExecutorServiceUtils.java: ########## @@ -124,7 +124,7 @@ public static void close(ExecutorService executorService) { public static void close(ExecutorService executorService, long terminationMillis) { executorService.shutdown(); try { - if (!executorService.awaitTermination(terminationMillis, TimeUnit.SECONDS)) { + if (!executorService.awaitTermination(terminationMillis, TimeUnit.MILLISECONDS)) { Review Comment: I will be upfront that I haven't dug too deeply myself and relied on AI (claude) to look. I threw it at a thread dump + the source code ``` "Thread-41" #100 daemon prio=5 os_prio=0 cpu=0.00ms elapsed=X.XXs tid=0x... nid=0x... waiting on condition [0x...] java.lang.Thread.State: TIMED_WAITING (parking) at jdk.internal.misc.Unsafe.park(Native Method) - parking to wait for <0x...> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:269) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1763) at java.util.concurrent.ThreadPoolExecutor.awaitTermination(ThreadPoolExecutor.java:1475) at org.apache.pinot.spi.executor.DecoratorExecutorService.awaitTermination(DecoratorExecutorService.java:117) at org.apache.pinot.spi.executor.DecoratorExecutorService.awaitTermination(DecoratorExecutorService.java:117) at org.apache.pinot.spi.executor.DecoratorExecutorService.awaitTermination(DecoratorExecutorService.java:117) at org.apache.pinot.spi.executor.ExecutorServiceUtils.close(ExecutorServiceUtils.java:127) ← BUG: Uses TimeUnit.SECONDS instead of MILLISECONDS at org.apache.pinot.spi.executor.ExecutorServiceUtils.close(ExecutorServiceUtils.java:113) at org.apache.pinot.query.runtime.QueryRunner.shutDown(QueryRunner.java:261) at org.apache.pinot.query.service.server.QueryServer.shutdown(QueryServer.java:174) at org.apache.pinot.server.worker.WorkerQueryServer.shutDown(WorkerQueryServer.java:89) at org.apache.pinot.server.starter.ServerInstance.shutDown(ServerInstance.java:268) - locked <0x0000040083f3a6e8> (a org.apache.pinot.server.starter.ServerInstance) at org.apache.pinot.server.starter.helix.HelixServerStarter.stop(HelixServerStarter.java:XXX) at org.apache.pinot.tools.service.PinotServiceManager.stopPinotInstance(PinotServiceManager.java:202) at org.apache.pinot.tools.service.PinotServiceManager.stopPinotInstanceById(PinotServiceManager.java:283) at org.apache.pinot.tools.service.PinotServiceManager.stopAll(PinotServiceManager.java:236) at org.apache.pinot.tools.AbstractBaseCommand.cleanup(AbstractBaseCommand.java:32) at org.apache.pinot.tools.AbstractBaseCommand$1.run(AbstractBaseCommand.java:31) ← SIGTERM shutdown hook ``` this lined up with our previous notes that these servers effectively remained stuck for 8-9 hours. This is what Claude proposed as the deadlock ``` 1. SIGTERM received → Shutdown hook triggers (AbstractBaseCommand.java:31) 2. Shutdown cascade → Eventually calls ExecutorServiceUtils.close() 3. Executor waits 8.33 hours → Due to timeout bug, instead of 30 seconds 4. Query threads still active → Blocked in ArrayBlockingQueue.offer() trying to add results 5. No consumers reading → Queue consumers shut down as part of shutdown process 6. Deadlock → Shutdown waits for queries, queries can't complete due to full/unread queues ``` which is quite reasonable given the order of the logs we see getting stuck on shutting down the query service as well as error timeouts we've later seen with `{"executionTimeMs":32679109}}`. @priyen-stripe did the initial investigation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org