[
https://issues.apache.org/jira/browse/NIFI-4772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matt Burgess updated NIFI-4772:
-------------------------------
Resolution: Fixed
Status: Resolved (was: Patch Available)
> If several processors do not return from their @OnScheduled method, NiFi will
> stop scheduling any Processors
> ------------------------------------------------------------------------------------------------------------
>
> Key: NIFI-4772
> URL: https://issues.apache.org/jira/browse/NIFI-4772
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework
> Reporter: Mark Payne
> Assignee: Mark Payne
> Priority: Critical
> Fix For: 1.6.0
>
>
> If a Processor does not properly return from its @OnScheduled method and
> several instances of the processor are started, we can get into a state where
> no Processors can start. We start seeing log messages like the following:
> {code}
> 2018-01-10 10:16:31,433 WARN [StandardProcessScheduler Thread-1]
> o.a.n.controller.StandardProcessorNode Timed out while waiting for
> OnScheduled of 'UpdateAttribute' processor to finish. An attempt is made to
> cancel the task via Thread.interrupt(). However it does not guarantee that
> the task will be canceled since the code inside current OnScheduled operation
> may have been written to ignore interrupts which may result in a runaway
> thread. This could lead to more issues, eventually requiring NiFi to be
> restarted. This is usually a bug in the target Processor
> 'UpdateAttribute[id=95423ee6-e6a6-1220-83ad-af20577063bd]' that needs to be
> documented, reported and eventually fixed.
> 2018-01-10 10:16:42,937 WARN [StandardProcessScheduler Thread-2]
> o.a.n.controller.StandardProcessorNode Timed out while waiting for
> OnScheduled of 'PutHDFS' processor to finish. An attempt is made to cancel
> the task via Thread.interrupt(). However it does not guarantee that the task
> will be canceled since the code inside current OnScheduled operation may have
> been written to ignore interrupts which may result in a runaway thread. This
> could lead to more issues, eventually requiring NiFi to be restarted. This is
> usually a bug in the target Processor
> 'PutHDFS[id=25e531ec-d873-1dec-acc9-ea745e7869ed]' that needs to be
> documented, reported and eventually fixed.
> 2018-01-10 10:16:46,993 WARN [StandardProcessScheduler Thread-4]
> o.a.n.controller.StandardProcessorNode Timed out while waiting for
> OnScheduled of 'LogAttribute' processor to finish. An attempt is made to
> cancel the task via Thread.interrupt(). However it does not guarantee that
> the task will be canceled since the code inside current OnScheduled operation
> may have been written to ignore interrupts which may result in a runaway
> thread. This could lead to more issues, eventually requiring NiFi to be
> restarted. This is usually a bug in the target Processor
> 'LogAttribute[id=9a683a06-aa24-19b5-ffff-ffff944a0216]' that needs to be
> documented, reported and eventually fixed.
> {code}
> While we should avoid having misbehaving Processors to begin with, the
> framework must also be tolerant of this and should not allow one misbehaving
> Processor from affecting other Processors.
> We can "approximate" this issue by following these steps:
> 1. Create 1 DebugFlow Processor. Auto-terminate its success & failure
> relationships. Set the "@OnScheduled Pause Time" property to "2 mins"
> 2. Copy & paste this DebugFlow Processor so that there are at least 8 of them.
> 3. Create a GenerateFlowFile Processor and an UpdateAttribute Processor. Send
> success of GenerateFlowFile to UpdateAttribute.
> 4. Start all of the DebugFlow Processors.
> 5. Start the GenerateFlowFIle and UpdateAttribute Processors.
> In this scenario, we will not see the above log messages, because after 1
> minute the DebugFlow Processor is interrupted and the @OnSchedule method
> completes Exceptionally. However, we do see that GenerateFlowFile and
> UpdateAttribute do not start running until after the 2 minute time window has
> elapsed. If DebugFlow instead did not complete Exceptionally, then
> GenerateFlowFile and UpdateAttribute would never start running and we would
> see the above error messages in the log.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)