[ https://issues.apache.org/jira/browse/NIFI-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15491154#comment-15491154 ]
Arpit Gupta commented on NIFI-2776: ----------------------------------- Around this time we see the following in the nifi-app.log {code} 2016-09-14 06:24:34,346 INFO [StandardProcessScheduler Thread-1] o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled GetFile[id=275e48e3-0157-1000-0000-00000ba0cd25] to run with 1 threads 2016-09-14 06:24:34,559 INFO [StandardProcessScheduler Thread-2] org.elasticsearch.plugins [Titania] loaded [], sites [] 2016-09-14 06:24:34,714 INFO [Process Cluster Protocol Request-6] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 1858ae64-a654-41ae-bcc8-6e10dd5fdaad (type=DISCONNECTION_REQUEST, length=596 bytes) from host:port in 10 millis 2016-09-14 06:24:34,718 INFO [Disconnect from Cluster] o.a.nifi.controller.StandardFlowService Received disconnection request message from manager with explanation: User anonymous requested that node be disconnected from cluster 2016-09-14 06:24:34,718 INFO [Disconnect from Cluster] o.a.nifi.controller.StandardFlowService Disconnecting node. 2016-09-14 06:24:34,719 INFO [Disconnect from Cluster] o.apache.nifi.controller.FlowController Cluster State changed from Clustered to Not Clustered 2016-09-14 06:24:34,725 INFO [Disconnect from Cluster] o.a.n.c.l.e.CuratorLeaderElectionManager This node is no longer registered to be elected as the Leader for Role 'Primary Node' 2016-09-14 06:24:34,732 INFO [Disconnect from Cluster] o.a.n.c.l.e.CuratorLeaderElectionManager This node is no longer registered to be elected as the Leader for Role 'Cluster Coordinator' 2016-09-14 06:24:34,733 INFO [Disconnect from Cluster] o.a.nifi.controller.StandardFlowService Node disconnected. 2016-09-14 06:24:34,747 ERROR [Leader Election Notification Thread-2] o.a.c.f.recipes.leader.LeaderSelector The leader threw an exception java.lang.IllegalMonitorStateException: You do not own the lock: /nifi/leaders/Cluster Coordinator at org.apache.curator.framework.recipes.locks.InterProcessMutex.release(InterProcessMutex.java:140) ~[curator-recipes-2.11.0.jar:na] at org.apache.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:425) [curator-recipes-2.11.0.jar:na] at org.apache.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:441) [curator-recipes-2.11.0.jar:na] at org.apache.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:64) [curator-recipes-2.11.0.jar:na] at org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:245) [curator-recipes-2.11.0.jar:na] at org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:239) [curator-recipes-2.11.0.jar:na] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_101] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_101] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_101] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_101] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_101] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_101] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.8.0_101] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_101] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_101] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101] 2016-09-14 06:24:34,747 ERROR [Leader Election Notification Thread-1] o.a.c.f.recipes.leader.LeaderSelector The leader threw an exception java.lang.IllegalMonitorStateException: You do not own the lock: /nifi/leaders/Primary Node at org.apache.curator.framework.recipes.locks.InterProcessMutex.release(InterProcessMutex.java:140) ~[curator-recipes-2.11.0.jar:na] at org.apache.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:425) [curator-recipes-2.11.0.jar:na] at org.apache.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:441) [curator-recipes-2.11.0.jar:na] at org.apache.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:64) [curator-recipes-2.11.0.jar:na] at org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:245) [curator-recipes-2.11.0.jar:na] at org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:239) [curator-recipes-2.11.0.jar:na] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_101] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_101] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_101] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_101] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_101] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [na:1.8.0_101] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [na:1.8.0_101] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_101] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_101] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101] 2016-09-14 06:24:34,790 INFO [Flow Service Tasks Thread-2] o.a.nifi.controller.StandardFlowService Saved flow controller org.apache.nifi.controller.FlowController@dd83cf0 // Another save pending = false 2016-09-14 06:24:34,854 INFO [NiFi Web Server-20] o.a.n.controller.StandardProcessorNode Stopping processor: class org.apache.nifi.processors.elasticsearch.PutElasticsearch 2016-09-14 06:24:34,948 INFO [Process Cluster Protocol Request-7] o.a.n.c.c.node.NodeClusterCoordinator Status of host:port changed from NodeConnectionStatus[nodeId=host:port, state=CONNECTED, updateId=22] to NodeConnectionStatus[nodeId=host:port, state=CONNECTING, updateId=26] 2016-09-14 06:24:34,959 INFO [Process Cluster Protocol Request-7] o.a.c.f.imps.CuratorFrameworkImpl Starting 2016-09-14 06:24:34,976 INFO [Process Cluster Protocol Request-7-EventThread] o.a.c.f.state.ConnectionStateManager State change: CONNECTED 2016-09-14 06:24:35,002 INFO [Curator-Framework-0] o.a.c.f.imps.CuratorFrameworkImpl backgroundOperationsLoop exiting 2016-09-14 06:24:35,006 INFO [Process Cluster Protocol Request-8] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 752f264c-10d4-4055-83f4-238d2c661923 (type=RECONNECTION_REQUEST, length=12285 bytes) from host:port in 37 millis 2016-09-14 06:24:35,006 INFO [Reconnect to Cluster] o.a.nifi.controller.StandardFlowService Processing reconnection request from manager. 2016-09-14 06:24:35,006 INFO [Reconnect to Cluster] o.a.n.c.c.node.NodeClusterCoordinator Resetting cluster node statuses from {host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTED, updateId=21], host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTING, updateId=26], host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTED, updateId=23], host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTED, updateId=24]} to {host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTED, updateId=21], host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTING, updateId=26], host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTED, updateId=23], host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTED, updateId=24]} 2016-09-14 06:24:35,021 INFO [Process Cluster Protocol Request-7] o.a.n.c.p.impl.SocketProtocolListener Finished processing request 5e079f32-6280-437d-b254-9f5abb4388b0 (type=NODE_STATUS_CHANGE, length=1052 bytes) from nat-r6-pvzu-nifi-5.openstacklocal in 75 millis 2016-09-14 06:24:35,146 WARN [Reconnect to Cluster] o.a.n.controller.StandardProcessorNode Can not start 'PutElasticsearch' since it's already in the process of being started or it is DISABLED - STOPPING 2016-09-14 06:24:35,147 WARN [Reconnect to Cluster] o.a.n.p.elasticsearch.PutElasticsearch PutElasticsearch[id=275e4abc-0157-1000-0000-00005740dd0c] Can not start 'PutElasticsearch' since it's already in the process of being started or it is DISABLED - STOPPING 2016-09-14 06:24:35,177 INFO [Reconnect to Cluster] o.apache.nifi.controller.FlowController Successfully synchronized controller with proposed flow 2016-09-14 06:24:35,193 INFO [Reconnect to Cluster] o.a.nifi.controller.StandardFlowService Setting Flow Controller's Node ID: host:port 2016-09-14 06:24:35,197 INFO [Reconnect to Cluster] o.a.n.c.c.node.NodeClusterCoordinator This node is now connected to the cluster. Will no longer require election of DataFlow. 2016-09-14 06:24:35,202 INFO [Reconnect to Cluster] o.apache.nifi.controller.FlowController Cluster State changed from Not Clustered to Clustered 2016-09-14 06:24:35,203 INFO [Reconnect to Cluster] o.a.n.c.l.e.CuratorLeaderElectionManager CuratorLeaderElectionManager[stopped=false] Registered new Leader Selector for role Primary Node; this node is an active participant in the election. 2016-09-14 06:24:35,206 INFO [Reconnect to Cluster] o.a.n.c.l.e.CuratorLeaderElectionManager CuratorLeaderElectionManager[stopped=false] Registered new Leader Selector for role Cluster Coordinator; this node is an active participant in the election. 2016-09-14 06:24:35,206 INFO [Reconnect to Cluster] o.apache.nifi.controller.FlowController Starting 0 processors/ports/funnels 2016-09-14 06:24:35,206 INFO [Reconnect to Cluster] o.apache.nifi.controller.FlowController Started 0 Remote Group Ports transmitting 2016-09-14 06:24:35,207 INFO [Reconnect to Cluster] o.a.n.c.c.node.NodeClusterCoordinator Resetting cluster node statuses from {host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTED, updateId=21], host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTING, updateId=26], host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTED, updateId=23], host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTED, updateId=24]} to {host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTED, updateId=21], host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTING, updateId=26], host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTED, updateId=23], host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTED, updateId=24]} 2016-09-14 06:24:35,207 INFO [Reconnect to Cluster] o.a.nifi.controller.StandardFlowService Node reconnected. 2016-09-14 06:24:35,225 INFO [Clustering Tasks Thread-2] o.a.n.c.c.ClusterProtocolHeartbeater Heartbeat created at 2016-09-14 06:24:35,210 and sent to host:port at 2016-09-14 06:24:35,225; send took 15 millis 2016-09-14 06:24:35,235 INFO [Clustering Tasks Thread-1] o.a.n.c.c.ClusterProtocolHeartbeater Heartbeat created at 2016-09-14 06:24:35,212 and sent to host:port at 2016-09-14 06:24:35,235; send took 9 millis {code} > When node connects back the to the cluster intermittently processor does not > return to the same state as its in the cluster > --------------------------------------------------------------------------------------------------------------------------- > > Key: NIFI-2776 > URL: https://issues.apache.org/jira/browse/NIFI-2776 > Project: Apache NiFi > Issue Type: Bug > Components: Core Framework > Affects Versions: 1.1.0 > Reporter: Arpit Gupta > Fix For: 1.1.0 > > > Here is the scenario > 1. Create a flow and start a processor > 2. Disconnect a node > 3. On the disconnected node stop the above processor > 4. Connect the above node to the cluster > 5. Wait 30s. > 6. Check if the processor started on the node that was connected in #4. > Very intermittently we see that the processor does not get into running state. > When we query the processor status on the node we get the following bulletin > {code} > "bulletins": [{ > "id": 0, > "groupId": "275e45f8-0157-1000-ffff-fffff191c079", > "sourceId": "275e4abc-0157-1000-0000-00005740dd0c", > "timestamp": "06:24:35 UTC", > "nodeAddress": "host:port", > "canRead": true, > "bulletin": { > "id": 0, > "nodeAddress": "host:port", > "category": "Log Message", > "groupId": "275e45f8-0157-1000-ffff-fffff191c079", > "sourceId": "275e4abc-0157-1000-0000-00005740dd0c", > "sourceName": "putES", > "level": "WARNING", > "message": > "PutElasticsearch[id=275e4abc-0157-1000-0000-00005740dd0c] Can not start > 'PutElasticsearch' since it's already in the process of being started or it > is DISABLED - STOPPING", > "timestamp": "06:24:35 UTC" > } > }], > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)