[
https://issues.apache.org/jira/browse/NIFI-5585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16622634#comment-16622634
]
ASF GitHub Bot commented on NIFI-5585:
--------------------------------------
Github user markap14 commented on a diff in the pull request:
https://github.com/apache/nifi/pull/3010#discussion_r219295502
--- Diff:
nifi-nar-bundles/nifi-framework-bundle/nifi-framework/nifi-framework-core/src/main/java/org/apache/nifi/controller/StandardFlowService.java
---
@@ -662,6 +682,39 @@ private void handleReconnectionRequest(final
ReconnectionRequestMessage request)
}
}
+ private void handleDecommissionRequest(final DecommissionMessage
request) throws InterruptedException {
+ logger.info("Received decommission request message from manager
with explanation: " + request.getExplanation());
+ decommission(request.getExplanation());
+ }
+
+ private void decommission(final String explanation) throws
InterruptedException {
+ writeLock.lock();
+ try {
+
+ logger.info("Decommissioning node due to " + explanation);
+
+ // mark node as decommissioning
+ controller.setConnectionStatus(new
NodeConnectionStatus(nodeId, NodeConnectionState.DECOMMISSIONING,
DecommissionCode.DECOMMISSIONED, explanation));
+ // request to stop all processors on node
+ controller.stopAllProcessors();
--- End diff --
In addition to calling stopAllProcessors() I think we should be terminating
all active processors as well. We can do this by getting the Root Process Group
from the FlowController, then from that getting all Processors and for each one
if processor.getScheduledState() == ScheduledState.STOPPED (do this in case a
processor is disabled), call ProcessGroup.terminateProcessor(ProcessorNode) and
then recursing through all groups.
This will ensure that even if a Processor has hold of a FlowFile and makes
no progress we can still push the FlowFiles out to other nodes in the cluster
in order to decommission.
> Decommision Nodes from Cluster
> ------------------------------
>
> Key: NIFI-5585
> URL: https://issues.apache.org/jira/browse/NIFI-5585
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Core Framework
> Affects Versions: 1.7.1
> Reporter: Jeff Storck
> Assignee: Jeff Storck
> Priority: Major
>
> Allow a node in the cluster to be decommissioned, rebalancing flowfiles on
> the node to be decommissioned to the other active nodes. This work depends
> on NIFI-5516.
> Similar to the client sending PUT request a DISCONNECTING message to
> cluster/nodes/\{id}, a DECOMMISSIONING message can be sent as a PUT request
> to the same URI to initiate a DECOMMISSION for a DISCONNECTED node. The
> DECOMMISSIONING request will be idempotent.
> The steps to decommission a node and remove it from the cluster are:
> # Send request to disconnect the node
> # Once disconnect completes, send request to decommission the node.
> # Once decommission completes, send request to delete node.
> When an error occurs and the node can not complete decommissioning, the user
> can:
> # Send request to delete the node from the cluster
> # Diagnose why the node had issues with the decommission (out of memory, no
> network connection, etc) and address the issue
> # Restart NiFi on the node to so that it will reconnect to the cluster
> # Go through the steps to decommission and remove a node
> Toolkit CLI commands for retrieving a list of nodes and
> disconnecting/decommissioning/deleting nodes have been added.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)