Tommy, Thanks for the great write up! I've replicated the issue of the node disconnecting using the steps you've provided. I've created a JIRA for the issue [1]. For the other concern, that is how it's currently designed to work. The 'run on primary node only' applies when a node is part of a cluster. If a node is disconnected from a cluster and a processor is configured with that scheduling strategy the processor will run as though it's timer driven.
We should have the counters issue addressed for the upcoming 0.3.0 release. Thanks! Matt [1] https://issues.apache.org/jira/browse/NIFI-926 > On Thu, Sep 3, 2015 at 5:14 AM, [email protected] > <[email protected]> wrote: > Hi, > > I have a three machine setup (1 NCM + 2 Nodes) running 0.2.0-incubating and > observed the following: > > > 1. Resetting counters can result in the MCN disconnecting a node > > 2. The node that is disconnected begins processing FlowFiles > > Description: > > My clustered NiFi is running a single pipeline containing 3 processors. While > the pipeline is running, resetting counters will result in any nodes which > are not processing anything (i.e. are not contributing to the count) to > disconnect. The node can then be reconnected via the UI. Looking at the stats > it appears the pipeline then began running on the disconnected node, as well > as the single remaining connected node. This has been tested using custom > processors as well as standard processors. > > Steps to Replicate: > > > 1. Create cluster with 2 nodes + 1 MCN (2 nodes for processing are > needed or the problem won't appear) > > 2. Add GenerateFlowFile processor: > > a. Scheduling: Change Scheduling strategy to 'On primary node' > > b. Properties: Change File Size to '10B' (say) > > 3. Add HashAttribute processor: > > a. Properties: Change Key to 'hash.value' > > 4. Add DetectDuplicate processor: > > a. Properties: Under Distributed Cache Service add a > 'DistributedMapCacheClientService' > > i. For > the Client Service Add Server name to 'localhost' under properties > > ii. Enable > The Client Service > > iii. Add a > DistrubtedMapCacheServer under the Controller Services > > iv. Enable > the Cache Server > > v. Exit > NiFi Flow Settings > > 5. Connect all 3 processors on success > > 6. Auto-terminate all options for DetectDuplicate > > 7. Run all processors and wait for ~10seconds or so > > 8. Open counters tab and refresh to make sure counters > 0 > > 9. Reset one of the counters > > Note: I'm specifically using the DetectDuplicate processor in this example > because it contains a custom counter. > > This should then disconnect the node that was not active (node that was not > selected to be the primary). Even though the GenerateFlowFile processor is > scheduled to run on the primary node the disconnected node begins to emit > FlowFiles. > > The following Warning was pulled from the MCNs logs: > > 2015-09-02 10:40:16,750 WARN [NiFi Web Server-149] > o.a.n.c.manager.impl.WebClusterManager One or more nodes failed to process > URI > 'http://localhost:8082/nifi-api/controller/counters/2207ea22-0d4a-389d-b746-82e568c6228d'. > Requesting each node to disconnect from cluster. > > I'm interested in knowing if this is expected behaviour or if I should open a > JIRA ticket (2 perhaps). > > Thanks, > Tommy > Please consider the environment before printing this email. This message > should be regarded as confidential. If you have received this email in error > please notify the sender and destroy it immediately. Statements of intent > shall only become binding when confirmed in hard copy by an authorised > signatory. The contents of this email may relate to dealings with other > companies under the control of BAE Systems Applied Intelligence Limited, > details of which can be found at > http://www.baesystems.com/Businesses/index.htm.
