[ https://issues.apache.org/jira/browse/OOZIE-3646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17458913#comment-17458913 ]
Junfan Zhang edited comment on OOZIE-3646 at 12/16/21, 2:44 AM: ---------------------------------------------------------------- Thanks [~dionusos]. Now the test case has been attached([link|https://github.com/apache/oozie/pull/65]), you could run it and reproduce the dead-lock. Besides, the stucked thread stack is in ticket's attachment. If you run {{testPossibleDeadLock}} method, it will fail. But when you make the {{{}ConfigurationService.setBoolean(SignalXCommand.FORK_PARALLEL_JOBSUBMISSION, false);{}}}, everything is ok. Because of the sync invoking in {{SignalXCommand}} {code:java} List<Future<ActionExecutorContext>> futures = Services.get().get(CallableQueueService.class) .invokeAll(tasks) {code} Please check it and let me know what you think [~dionusos] was (Author: zuston): Thanks [~dionusos]. Now the test case has been attached([link|https://github.com/apache/oozie/pull/65]), Besides, the stucked thread stack is in ticket's attachment. If you run {{testPossibleDeadLock}} method, it will fail. But you make the {{ConfigurationService.setBoolean(SignalXCommand.FORK_PARALLEL_JOBSUBMISSION, false);}}, everything is ok. Because of the sync invoking in {{SignalXCommand}} {code:java} List<Future<ActionExecutorContext>> futures = Services.get().get(CallableQueueService.class) .invokeAll(tasks) {code} Please check it and let me know what you think [~dionusos] > Possible dead-lock in SignalXCommand > ------------------------------------ > > Key: OOZIE-3646 > URL: https://issues.apache.org/jira/browse/OOZIE-3646 > Project: Oozie > Issue Type: Bug > Reporter: Junfan Zhang > Assignee: Junfan Zhang > Priority: Major > Attachments: OOZIE-3646.patch-1, a1.png > > > The limited thread execution mechanism aims to solve the dead-lock when all > active threads are executing the SignalXCommand's invokeAll method. > h2. Dead-lock when to happen > Assuming that Oozie CallableQueue thread pool size is 120, when all threads > are executing the {{SignalXCommand.startForkedActions}} method, a deadlock > occurs. > Because in {{SignalXCommand.startForkedActions}}, the code of > {code:java} > List<Future<ActionExecutorContext>> futures = > Services.get().get(CallableQueueService.class) > .invokeAll(tasks); > {code} > will be sync executed, however now all callableQueue threads are busy. > h2. Solution > 1. Need to limit directly invokeAll call when the num of rest threads is less > than the tasks > 2. To obtain correct active threads number in callableQueue, the > SignalXCommand.class lock is needed. -- This message was sent by Atlassian Jira (v8.20.1#820001)