[ 
https://issues.apache.org/jira/browse/NIFI-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pierre Villard resolved NIFI-3564.
----------------------------------
    Resolution: Feedback Received

Apache NiFi 1.x is no longer maintained and no new release is planned on the 
1.x release line. Marking as resolved as part of a cleanup operation. Please 
open a new one with an updated description if this is still relevant for NiFi 
2.x.

> Deadlock on startup
> -------------------
>
>                 Key: NIFI-3564
>                 URL: https://issues.apache.org/jira/browse/NIFI-3564
>             Project: Apache NiFi
>          Issue Type: Bug
>    Affects Versions: 0.7.1, 1.1.1
>            Reporter: Brandon Rhys DeVries
>            Priority: Major
>
> We have uncovered an issue in the way that ControllerServices and Processors 
> are started that can result in a deadlock.  Basically, a ControllerService 
> that is reported by the framework as ENABLING might not actually be.  This is 
> because of how they are scheduled to be started in 
> StandardControllerServiceNode.enable()\[1].  This changes the state from 
> DISABLED to ENABLING, and *then* actually schedules the OnEnabled method to 
> be called.  However, it is scheduled with a ScheduledExecutorService that is 
> limited to 8 threads\[2], and is *also used to start Processors*\[3].  
> The situation that exposed the bug was a Processor that attempted to wait for 
> a ControllerService to become ENABLED in its customValidate() method.  The 
> ControllerService must be at least in the ENABLING state to pass framework 
> validation, and since the ControllerService was neccessary to do the custom 
> validation, waiting for it to become ENABLED seems reasonable.  However, 
> there were several (more than 8) instances of this custom Processor on the 
> graph, and the ControllerService being waited on was one of dozens.  This led 
> to the situation where all 8 of the executor threads were held by our 
> Processor's customValidate() method waiting for a service that will never 
> transition from ENABLING to ENABLED because to do so it needs one of those 
> same 8 threads.  This deadlocks the instance, preventing startup.
> My first thought as to a fix was to not set the ENABLING state until the 
> OnEnabled method was actually being called (as opposed to scheduled to be 
> called).  However, this could result in a Processor attempting to start with 
> a dependent ControllerService in a DISABLED state (even though the 
> ControllerService will eventually be ENABLED), which would cause the 
> processor to not start\[4](as opposed to being retried as is the case when 
> OnScheduled throws an Exception).  My feeling is that ultimately we're going 
> to need to wait for all ControllerServices to be ENABLED before moving on to 
> Processors, possibly using schedule(Callable) instead of execute(Runnable).  
> \[1] 
> https://github.com/apache/nifi/blob/rel/nifi-0.7.1/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/service/StandardControllerServiceNode.java#L299-L304
> \[2] 
> https://github.com/apache/nifi/blob/rel/nifi-0.7.1/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/scheduling/StandardProcessScheduler.java#L83
> \[3] 
> https://github.com/apache/nifi/blob/rel/nifi-0.7.1/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/StandardProcessorNode.java#L1219-L1228
> \[4] 
> https://github.com/apache/nifi/blob/rel/nifi-0.7.1/nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/StandardProcessorNode.java#L1221-L1223



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to