Re: Resetting counters whilst clustered disconnects nodes

Matt Gilman Fri, 04 Sep 2015 05:21:24 -0700

Tommy,

Thanks for the great write up! I've replicated the issue of the node 
disconnecting using the steps you've provided. I've created a JIRA for the 
issue [1]. For the other concern, that is how it's currently designed to work. 
The 'run on primary node only' applies when a node is part of a cluster. If a 
node is disconnected from a cluster and a processor is configured with that 
scheduling strategy the processor will run as though it's timer driven.


We should have the counters issue addressed for the upcoming 0.3.0 release.

Thanks!

Matt

[1] https://issues.apache.org/jira/browse/NIFI-926

> On Thu, Sep 3, 2015 at 5:14 AM, [email protected] 
> <[email protected]> wrote:
> Hi,
> 
> I have a three machine setup (1 NCM + 2 Nodes) running 0.2.0-incubating and 
> observed the following:
> 
> 
> 1.       Resetting counters can result in the MCN disconnecting a node
> 
> 2.       The node that is disconnected begins processing FlowFiles
> 
> Description:
> 
> My clustered NiFi is running a single pipeline containing 3 processors. While 
> the pipeline is running, resetting counters will result in any nodes which 
> are not processing anything (i.e. are not contributing to the count) to 
> disconnect. The node can then be reconnected via the UI. Looking at the stats 
> it appears the pipeline then began running on the disconnected node, as well 
> as the single remaining connected node. This has been tested using custom 
> processors as well as standard processors.
> 
> Steps to Replicate:
> 
> 
> 1.       Create cluster with 2 nodes + 1 MCN (2 nodes for processing are 
> needed or the problem won't appear)
> 
> 2.       Add GenerateFlowFile processor:
> 
> a.       Scheduling: Change Scheduling strategy to 'On primary node'
> 
> b.      Properties: Change File Size to '10B' (say)
> 
> 3.       Add HashAttribute processor:
> 
> a.       Properties: Change Key to 'hash.value'
> 
> 4.       Add DetectDuplicate processor:
> 
> a.       Properties: Under Distributed Cache Service add a 
> 'DistributedMapCacheClientService'
> 
>                                                                i.      For 
> the Client Service Add Server name to 'localhost' under properties
> 
>                                                              ii.      Enable 
> The Client Service
> 
>                                                             iii.      Add a 
> DistrubtedMapCacheServer under the Controller Services
> 
>                                                            iv.      Enable 
> the Cache Server
> 
>                                                              v.      Exit 
> NiFi Flow Settings
> 
> 5.       Connect all 3 processors on success
> 
> 6.       Auto-terminate all options for DetectDuplicate
> 
> 7.       Run all processors and wait for ~10seconds or so
> 
> 8.       Open counters tab and refresh to make sure counters > 0
> 
> 9.       Reset one of the counters
> 
> Note: I'm specifically using the DetectDuplicate processor in this example 
> because it contains a custom counter.
> 
> This should then disconnect the node that was not active (node that was not 
> selected to be the primary). Even though the GenerateFlowFile processor is 
> scheduled to run on the primary node the disconnected node begins to emit 
> FlowFiles.
> 
> The following Warning was pulled from the MCNs logs:
> 
> 2015-09-02 10:40:16,750 WARN [NiFi Web Server-149] 
> o.a.n.c.manager.impl.WebClusterManager One or more nodes failed to process 
> URI 
> 'http://localhost:8082/nifi-api/controller/counters/2207ea22-0d4a-389d-b746-82e568c6228d'.
>   Requesting each node to disconnect from cluster.
> 
> I'm interested in knowing if this is expected behaviour or if I should open a 
> JIRA ticket (2 perhaps).
> 
> Thanks,
> Tommy
> Please consider the environment before printing this email. This message 
> should be regarded as confidential. If you have received this email in error 
> please notify the sender and destroy it immediately. Statements of intent 
> shall only become binding when confirmed in hard copy by an authorised 
> signatory. The contents of this email may relate to dealings with other 
> companies under the control of BAE Systems Applied Intelligence Limited, 
> details of which can be found at 
> http://www.baesystems.com/Businesses/index.htm.

Re: Resetting counters whilst clustered disconnects nodes

Reply via email to