zhangnew opened a new issue, #40936: URL: https://github.com/apache/doris/issues/40936
### Search before asking - [X] I had searched in the [issues](https://github.com/apache/doris/issues?q=is%3Aissue) and found no similar issues. ### Version 3.0.0 ### What's Wrong? In the `DynamicPartitionScheduler` thread, a suspected deadlock occurs during the dynamic partition creation process. The thread's stack trace indicates that it gets stuck while trying to acquire a write lock in `TabletInvertedIndex.writeLock`. This deadlock prevents the `DynamicPartitionScheduler` from proceeding with its tasks, leading to the halt of automatic partition creation at later time points. Here is a sample thread stack trace illustrating the problem:(by arthas) ``` "DynamicPartitionScheduler" Id=51 WAITING on java.util.concurrent.locks.StampedLock@58b80c2a at [email protected]/jdk.internal.misc.Unsafe.park(Native Method) - waiting on java.util.concurrent.locks.StampedLock@58b80c2a at [email protected]/java.util.concurrent.locks.LockSupport.park(LockSupport.java:211) at [email protected]/java.util.concurrent.locks.StampedLock.acquireWrite(StampedLock.java:1251) at [email protected]/java.util.concurrent.locks.StampedLock.writeLock(StampedLock.java:480) at app//org.apache.doris.catalog.TabletInvertedIndex.writeLock(TabletInvertedIndex.java:122) at app//org.apache.doris.catalog.TabletInvertedIndex.addTablet(TabletInvertedIndex.java:576) at app//org.apache.doris.catalog.MaterializedIndex.addTablet(MaterializedIndex.java:128) at app//org.apache.doris.catalog.MaterializedIndex.addTablet(MaterializedIndex.java:121) at app//org.apache.doris.datasource.InternalCatalog.createTablets(InternalCatalog.java:3162) at app//org.apache.doris.datasource.InternalCatalog.createPartitionWithIndices(InternalCatalog.java:2025) at app//org.apache.doris.datasource.InternalCatalog.addPartition(InternalCatalog.java:1688) at app//org.apache.doris.catalog.Env.addPartition(Env.java:3251) at app//org.apache.doris.clone.DynamicPartitionScheduler.executeDynamicPartition(Unknown Source) at app//org.apache.doris.clone.DynamicPartitionScheduler.runAfterCatalogReady(Unknown Source) at app//org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:58) at app//org.apache.doris.common.util.Daemon.run(Daemon.java:116) ``` In normal situations, the DynamicPartitionScheduler thread is expected to be in a TIMED_WAITING state, as shown below: ``` "DynamicPartitionScheduler" Id=51 TIMED_WAITING at [email protected]/java.lang.Thread.sleep(Native Method) at app//org.apache.doris.common.util.Daemon.run(Daemon.java:122) ``` ### What You Expected? The `DynamicPartitionScheduler` should be able to acquire the necessary locks without entering a deadlock state. It should proceed to create dynamic partitions automatically at the configured intervals without any interruption or manual intervention. ### How to Reproduce? This issue has been observed only once so far, during a weekend, and has not been reliably reproducible. No manual operations or DDL commands were executed during this period. The Doris cluster contains only three dynamic partition tables, and the data volume is very small. The problem occurred under low load conditions without any specific triggers. ### Anything Else? one FE and one BE on the same machine. ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
