[ https://issues.apache.org/jira/browse/NIFI-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16336475#comment-16336475 ]
ASF GitHub Bot commented on NIFI-4772: -------------------------------------- Github user mattyb149 commented on a diff in the pull request: https://github.com/apache/nifi/pull/2403#discussion_r163398903 --- Diff: nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/scheduling/StandardProcessScheduler.java --- @@ -24,6 +24,7 @@ import java.util.concurrent.CompletableFuture; import java.util.concurrent.ConcurrentHashMap; import java.util.concurrent.ConcurrentMap; +import java.util.concurrent.Executors; --- End diff -- Unused import, will remove on merge > If several processors do not return from their @OnScheduled method, NiFi will > stop scheduling any Processors > ------------------------------------------------------------------------------------------------------------ > > Key: NIFI-4772 > URL: https://issues.apache.org/jira/browse/NIFI-4772 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework > Reporter: Mark Payne > Assignee: Mark Payne > Priority: Critical > > If a Processor does not properly return from its @OnScheduled method and > several instances of the processor are started, we can get into a state where > no Processors can start. We start seeing log messages like the following: > {code} > 2018-01-10 10:16:31,433 WARN [StandardProcessScheduler Thread-1] > o.a.n.controller.StandardProcessorNode Timed out while waiting for > OnScheduled of 'UpdateAttribute' processor to finish. An attempt is made to > cancel the task via Thread.interrupt(). However it does not guarantee that > the task will be canceled since the code inside current OnScheduled operation > may have been written to ignore interrupts which may result in a runaway > thread. This could lead to more issues, eventually requiring NiFi to be > restarted. This is usually a bug in the target Processor > 'UpdateAttribute[id=95423ee6-e6a6-1220-83ad-af20577063bd]' that needs to be > documented, reported and eventually fixed. > 2018-01-10 10:16:42,937 WARN [StandardProcessScheduler Thread-2] > o.a.n.controller.StandardProcessorNode Timed out while waiting for > OnScheduled of 'PutHDFS' processor to finish. An attempt is made to cancel > the task via Thread.interrupt(). However it does not guarantee that the task > will be canceled since the code inside current OnScheduled operation may have > been written to ignore interrupts which may result in a runaway thread. This > could lead to more issues, eventually requiring NiFi to be restarted. This is > usually a bug in the target Processor > 'PutHDFS[id=25e531ec-d873-1dec-acc9-ea745e7869ed]' that needs to be > documented, reported and eventually fixed. > 2018-01-10 10:16:46,993 WARN [StandardProcessScheduler Thread-4] > o.a.n.controller.StandardProcessorNode Timed out while waiting for > OnScheduled of 'LogAttribute' processor to finish. An attempt is made to > cancel the task via Thread.interrupt(). However it does not guarantee that > the task will be canceled since the code inside current OnScheduled operation > may have been written to ignore interrupts which may result in a runaway > thread. This could lead to more issues, eventually requiring NiFi to be > restarted. This is usually a bug in the target Processor > 'LogAttribute[id=9a683a06-aa24-19b5-ffff-ffff944a0216]' that needs to be > documented, reported and eventually fixed. > {code} > While we should avoid having misbehaving Processors to begin with, the > framework must also be tolerant of this and should not allow one misbehaving > Processor from affecting other Processors. > We can "approximate" this issue by following these steps: > 1. Create 1 DebugFlow Processor. Auto-terminate its success & failure > relationships. Set the "@OnScheduled Pause Time" property to "2 mins" > 2. Copy & paste this DebugFlow Processor so that there are at least 8 of them. > 3. Create a GenerateFlowFile Processor and an UpdateAttribute Processor. Send > success of GenerateFlowFile to UpdateAttribute. > 4. Start all of the DebugFlow Processors. > 5. Start the GenerateFlowFIle and UpdateAttribute Processors. > In this scenario, we will not see the above log messages, because after 1 > minute the DebugFlow Processor is interrupted and the @OnSchedule method > completes Exceptionally. However, we do see that GenerateFlowFile and > UpdateAttribute do not start running until after the 2 minute time window has > elapsed. If DebugFlow instead did not complete Exceptionally, then > GenerateFlowFile and UpdateAttribute would never start running and we would > see the above error messages in the log. -- This message was sent by Atlassian JIRA (v7.6.3#76005)