[ 
https://issues.apache.org/jira/browse/NIFI-10070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Handermann resolved NIFI-10070.
-------------------------------------
    Fix Version/s: 1.17.0
       Resolution: Fixed

> NiFi fails to delete/update component because it's still running, immediately 
> after confirming that the component is stopped.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NIFI-10070
>                 URL: https://issues.apache.org/jira/browse/NIFI-10070
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>            Reporter: Mark Payne
>            Assignee: Nathan Gough
>            Priority: Major
>              Labels: clustering, entity, merging, response
>             Fix For: 1.17.0
>
>          Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> This issue has been identified by analyzing the logs, code, etc., of the 
> system tests. Many of the system tests indicate that after each test (or 
> after a set of tests), the flow must be torn down. This will stop all 
> processors/reporting tasks and disable all controller services. It will then 
> wait for them to fully stop/disable, according to the REST API. It will then 
> purge any queues and delete all components. Then it deletes all components.
> However, occasionally we see a failure in the step that deletes the 
> components. One node will indicate that the component cannot be deleted 
> because it's still running, so the REST API will send back a 409. However, 
> before making this request, we've already made a request to get all 
> components and checked that their state is STOPPED/DISABLED and no active 
> threads.
> If we look at the code that is used to determine whether or not they are 
> STOPPED/DISABLED, it is using the "status" field in the Entity objects ( 
> {{reportingTaskEntity.getStatus().getRunStatus()}} for example).
> However, the DTO also has a state field: {{ReportingTaskDTO.getState()}}
> We have a similar situation with Processors, Reporting Tasks, and Controller 
> Services.
> In order to maintain backward compatibility, we need to leave both of these 
> fields. However, the issue we have appears to be in the 
> ReportingTaskEntityMerger, ProcessorEntityMerger, and 
> ControllerServiceEntityMerger.
> These mergers do not take into account / merge this status field in the 
> Entity. They take into account only the fields in the DTO. As a result, we 
> can have one node indicating that the status is STOPPED with 0 threads while 
> another node indicates STOPPED with 1 thread. The merging logic may choose 
> the STOPPED with 0 threads, confirming that the component is fully stopped. 
> At this point, a delete or update will fail because the component is not in 
> the desired state on all nodes.
> We need to update the 3 Entity Mergers to ensure that they properly merge the 
> state in the Entity objects as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to