ZJHZH opened a new issue, #60682: URL: https://github.com/apache/doris/issues/60682
### Search before asking - [x] I had searched in the [issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no similar issues. ### Version doris-3.1.4-rc02-7f5ba43de6 ### What's Wrong? If the number of partitions queried exceeds `num_partitions_in_batch_mode`, an error will be reported after waiting for 30 seconds. ``` ERROR 1105 (HY000): errCode = 2, detailMessage = Failed to get first split after waiting for 30 seconds. ``` The `Env.getCurrentEnv().getExtMetaCacheMgr().getScheduleExecutor()` contains a number of threads (greater than `max_external_cache_loader_thread_pool_size` from historical runs) that continuously call `queue.offer` in the `org.apache.doris.datasource.SplitAssignment#appendBatch` method in an infinite loop, the queue is full. It may be due to an abnormal termination of the query, but it is impossible to determine which query terminated or the reason for the termination. ``` private void appendBatch(Multimap<Backend, Split> batch) throws UserException { for (Backend backend : batch.keySet()) { // ... while (needMoreSplit()) { BlockingQueue<Collection<TScanRangeLocations>> queue = assignment.computeIfAbsent(backend, be -> new LinkedBlockingQueue<>(10000)); try { if (queue.offer(locations, 100, TimeUnit.MILLISECONDS)) { break; } } catch (InterruptedException e) { addUserException(new UserException("Failed to offer batch split by interrupted", e)); } } } } ``` ``` "NotCheckpointscheduleExecutor-0" Id=4862 TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@164e81e at [email protected]/jdk.internal.misc.Unsafe.park(Native Method) - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@164e81e at [email protected]/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:252) at [email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1679) at [email protected]/java.util.concurrent.LinkedBlockingQueue.offer(LinkedBlockingQueue.java:378) at app//org.apache.doris.datasource.SplitAssignment.appendBatch(Unknown Source) at app//org.apache.doris.datasource.SplitAssignment.addToQueue(Unknown Source) at app//org.apache.doris.datasource.hive.source.HiveScanNode.lambda$startSplit$0(Unknown Source) at app//org.apache.doris.datasource.hive.source.HiveScanNode$$Lambda$4683/0x00007f1235a97240.run(Unknown Source) at [email protected]/java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1804) at [email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at [email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at [email protected]/java.lang.Thread.run(Thread.java:840) ``` ### What You Expected? After the query is completed, `needMoreSplit()` returns false. Or method `appendBatch ` has a timeout period. ### How to Reproduce? _No response_ ### Anything Else? _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
