[ 
https://issues.apache.org/jira/browse/NIFI-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16336475#comment-16336475
 ] 

ASF GitHub Bot commented on NIFI-4772:
--------------------------------------

Github user mattyb149 commented on a diff in the pull request:

    https://github.com/apache/nifi/pull/2403#discussion_r163398903
  
    --- Diff: 
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/scheduling/StandardProcessScheduler.java
 ---
    @@ -24,6 +24,7 @@
     import java.util.concurrent.CompletableFuture;
     import java.util.concurrent.ConcurrentHashMap;
     import java.util.concurrent.ConcurrentMap;
    +import java.util.concurrent.Executors;
    --- End diff --
    
    Unused import, will remove on merge


> If several processors do not return from their @OnScheduled method, NiFi will 
> stop scheduling any Processors
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: NIFI-4772
>                 URL: https://issues.apache.org/jira/browse/NIFI-4772
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>            Priority: Critical
>
> If a Processor does not properly return from its @OnScheduled method and 
> several instances of the processor are started, we can get into a state where 
> no Processors can start. We start seeing log messages like the following:
> {code}
> 2018-01-10 10:16:31,433 WARN [StandardProcessScheduler Thread-1] 
> o.a.n.controller.StandardProcessorNode Timed out while waiting for 
> OnScheduled of 'UpdateAttribute' processor to finish. An attempt is made to 
> cancel the task via Thread.interrupt(). However it does not guarantee that 
> the task will be canceled since the code inside current OnScheduled operation 
> may have been written to ignore interrupts which may result in a runaway 
> thread. This could lead to more issues, eventually requiring NiFi to be 
> restarted. This is usually a bug in the target Processor 
> 'UpdateAttribute[id=95423ee6-e6a6-1220-83ad-af20577063bd]' that needs to be 
> documented, reported and eventually fixed.
> 2018-01-10 10:16:42,937 WARN [StandardProcessScheduler Thread-2] 
> o.a.n.controller.StandardProcessorNode Timed out while waiting for 
> OnScheduled of 'PutHDFS' processor to finish. An attempt is made to cancel 
> the task via Thread.interrupt(). However it does not guarantee that the task 
> will be canceled since the code inside current OnScheduled operation may have 
> been written to ignore interrupts which may result in a runaway thread. This 
> could lead to more issues, eventually requiring NiFi to be restarted. This is 
> usually a bug in the target Processor 
> 'PutHDFS[id=25e531ec-d873-1dec-acc9-ea745e7869ed]' that needs to be 
> documented, reported and eventually fixed.
> 2018-01-10 10:16:46,993 WARN [StandardProcessScheduler Thread-4] 
> o.a.n.controller.StandardProcessorNode Timed out while waiting for 
> OnScheduled of 'LogAttribute' processor to finish. An attempt is made to 
> cancel the task via Thread.interrupt(). However it does not guarantee that 
> the task will be canceled since the code inside current OnScheduled operation 
> may have been written to ignore interrupts which may result in a runaway 
> thread. This could lead to more issues, eventually requiring NiFi to be 
> restarted. This is usually a bug in the target Processor 
> 'LogAttribute[id=9a683a06-aa24-19b5-ffff-ffff944a0216]' that needs to be 
> documented, reported and eventually fixed.
> {code}
> While we should avoid having misbehaving Processors to begin with, the 
> framework must also be tolerant of this and should not allow one misbehaving 
> Processor from affecting other Processors.
> We can "approximate" this issue by following these steps:
> 1. Create 1 DebugFlow Processor. Auto-terminate its success & failure 
> relationships. Set the "@OnScheduled Pause Time" property to "2 mins"
> 2. Copy & paste this DebugFlow Processor so that there are at least 8 of them.
> 3. Create a GenerateFlowFile Processor and an UpdateAttribute Processor. Send 
> success of GenerateFlowFile to UpdateAttribute.
> 4. Start all of the DebugFlow Processors.
> 5. Start the GenerateFlowFIle and UpdateAttribute Processors.
> In this scenario, we will not see the above log messages, because after 1 
> minute the DebugFlow Processor is interrupted and the @OnSchedule method 
> completes Exceptionally. However, we do see that GenerateFlowFile and 
> UpdateAttribute do not start running until after the 2 minute time window has 
> elapsed. If DebugFlow instead did not complete Exceptionally, then 
> GenerateFlowFile and UpdateAttribute would never start running and we would 
> see the above error messages in the log.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to