[
https://issues.apache.org/jira/browse/NIFI-15901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
AK updated NIFI-15901:
----------------------
Description:
This is a similar issue to https://issues.apache.org/jira/browse/NIFI-15801.
When the proposed flow adds a controller service at a child group level with
the same identifier as an existing ENABLED service at the root level,
{{synchronizeControllerServices}} finds the root service via ancestor lookup
and calls {{setProperties}} on it without first disabling it. This causes an
{{IllegalStateException}} from {{verifyModifiable()}} and the node enters an
infinite disconnect/reconnect loop. This is the controller-service equivalent
of the processor bug fixed in NIFI-15801 and the same fix applies: disable
before update, re-enable in a finally block.
*Steps to reproduce*
# Deploy a 2-node NiFi cluster with a flow where the same CS identifier
appears at root group level and child group level.
# Wait for the cluster to be healthy (both nodes CONNECTED, both CSes ENABLED).
# Disconnect the non-coordinator node (via REST API or UI Cluster view) - the
node stays running with CSes ENABLED.
# On the disconnected node, disable and delete the child-group CS (via REST
API or UI).
# Reconnect the node (via REST API or UI Cluster view)
*Expected*
The node re-joins and the original flow is applied to re-connecting node.
*Actual*
The coordinator proposes the original flow (CS at both levels). The
reconnecting node's sync finds the root CS via ancestor lookup, marks it for
update, and calls setProperties on it while still ENABLED and node enters
infinite disconnect/reconnect loop.
was:
This is a similar issue to https://issues.apache.org/jira/browse/NIFI-15801.
When the proposed flow adds a controller service at a child group level with
the same identifier as an existing ENABLED service at the root level,
{{synchronizeControllerServices}} finds the root service via ancestor lookup
and calls {{setProperties}} on it without first disabling it. This causes an
{{IllegalStateException}} from {{verifyModifiable()}} and the node enters an
infinite disconnect/reconnect loop. This is the controller-service equivalent
of the processor bug fixed in NIFI-15801 — same fix applies: disable before
update, re-enable in a finally block.
*Steps to reproduce*
# Deploy a 2-node NiFi cluster with a flow where the same CS identifier
appears at root group level and child group level.
# Wait for the cluster to be healthy (both nodes CONNECTED, both CSes ENABLED).
# Disconnect the non-coordinator node (via REST API or UI Cluster view) - the
node stays running with CSes ENABLED.
# On the disconnected node, disable and delete the child-group CS (via REST
API or UI).
# Reconnect the node (via REST API or UI Cluster view)
*Expected*
The node re-joins and the original flow is applied to re-connecting node.
*Actual*
The coordinator proposes the original flow (CS at both levels). The
reconnecting node's sync finds the root CS via ancestor lookup, marks it for
update, and calls setProperties on it while still ENABLED and node enters
infinite disconnect/reconnect loop.
> StandardVersionedComponentSynchronizer.synchronizeControllerServices fails
> when controller services are enabled
> ---------------------------------------------------------------------------------------------------------------
>
> Key: NIFI-15901
> URL: https://issues.apache.org/jira/browse/NIFI-15901
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework
> Reporter: AK
> Priority: Minor
> Fix For: 2.10.0
>
>
> This is a similar issue to https://issues.apache.org/jira/browse/NIFI-15801.
> When the proposed flow adds a controller service at a child group level with
> the same identifier as an existing ENABLED service at the root level,
> {{synchronizeControllerServices}} finds the root service via ancestor lookup
> and calls {{setProperties}} on it without first disabling it. This causes an
> {{IllegalStateException}} from {{verifyModifiable()}} and the node enters an
> infinite disconnect/reconnect loop. This is the controller-service equivalent
> of the processor bug fixed in NIFI-15801 and the same fix applies: disable
> before update, re-enable in a finally block.
> *Steps to reproduce*
> # Deploy a 2-node NiFi cluster with a flow where the same CS identifier
> appears at root group level and child group level.
> # Wait for the cluster to be healthy (both nodes CONNECTED, both CSes
> ENABLED).
> # Disconnect the non-coordinator node (via REST API or UI Cluster view) -
> the node stays running with CSes ENABLED.
> # On the disconnected node, disable and delete the child-group CS (via REST
> API or UI).
> # Reconnect the node (via REST API or UI Cluster view)
> *Expected*
> The node re-joins and the original flow is applied to re-connecting node.
> *Actual*
> The coordinator proposes the original flow (CS at both levels). The
> reconnecting node's sync finds the root CS via ancestor lookup, marks it for
> update, and calls setProperties on it while still ENABLED and node enters
> infinite disconnect/reconnect loop.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)