Resetting counters whilst clustered disconnects nodes

[email protected] Thu, 03 Sep 2015 05:15:00 -0700

Hi,

I have a three machine setup (1 NCM + 2 Nodes) running 0.2.0-incubating and 
observed the following:

1. Resetting counters can result in the MCN disconnecting a node

2. The node that is disconnected begins processing FlowFiles

Description:

My clustered NiFi is running a single pipeline containing 3 processors. While
the pipeline is running, resetting counters will result in any nodes which are
not processing anything (i.e. are not contributing to the count) to disconnect.
The node can then be reconnected via the UI. Looking at the stats it appears
the pipeline then began running on the disconnected node, as well as the single
remaining connected node. This has been tested using custom processors as well
as standard processors.

Steps to Replicate:

1. Create cluster with 2 nodes + 1 MCN (2 nodes for processing are needed
or the problem won't appear)

2. Add GenerateFlowFile processor:

a. Scheduling: Change Scheduling strategy to 'On primary node'

b. Properties: Change File Size to '10B' (say)

3. Add HashAttribute processor:

a. Properties: Change Key to 'hash.value'

4. Add DetectDuplicate processor:

a. Properties: Under Distributed Cache Service add a
'DistributedMapCacheClientService'

i. For the
Client Service Add Server name to 'localhost' under properties

ii. Enable
The Client Service

iii. Add a
DistrubtedMapCacheServer under the Controller Services

iv. Enable the
Cache Server

v. Exit NiFi
Flow Settings

5. Connect all 3 processors on success

6. Auto-terminate all options for DetectDuplicate

7. Run all processors and wait for ~10seconds or so

8. Open counters tab and refresh to make sure counters > 0

9. Reset one of the counters

Note: I'm specifically using the DetectDuplicate processor in this example
because it contains a custom counter.

This should then disconnect the node that was not active (node that was not
selected to be the primary). Even though the GenerateFlowFile processor is
scheduled to run on the primary node the disconnected node begins to emit
FlowFiles.

The following Warning was pulled from the MCNs logs:

2015-09-02 10:40:16,750 WARN [NiFi Web Server-149]
o.a.n.c.manager.impl.WebClusterManager One or more nodes failed to process URI
'http://localhost:8082/nifi-api/controller/counters/2207ea22-0d4a-389d-b746-82e568c6228d'.
Requesting each node to disconnect from cluster.

I'm interested in knowing if this is expected behaviour or if I should open a
JIRA ticket (2 perhaps).

Thanks,
Tommy
Please consider the environment before printing this email. This message should
be regarded as confidential. If you have received this email in error please
notify the sender and destroy it immediately. Statements of intent shall only
become binding when confirmed in hard copy by an authorised signatory. The
contents of this email may relate to dealings with other companies under the
control of BAE Systems Applied Intelligence Limited, details of which can be
found at http://www.baesystems.com/Businesses/index.htm.

Resetting counters whilst clustered disconnects nodes

Reply via email to