[
https://issues.apache.org/jira/browse/NIFI-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15491154#comment-15491154
]
Arpit Gupta commented on NIFI-2776:
-----------------------------------
Around this time we see the following in the nifi-app.log
{code}
2016-09-14 06:24:34,346 INFO [StandardProcessScheduler Thread-1]
o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled
GetFile[id=275e48e3-0157-1000-0000-00000ba0cd25] to run with 1 threads
2016-09-14 06:24:34,559 INFO [StandardProcessScheduler Thread-2]
org.elasticsearch.plugins [Titania] loaded [], sites []
2016-09-14 06:24:34,714 INFO [Process Cluster Protocol Request-6]
o.a.n.c.p.impl.SocketProtocolListener Finished processing request
1858ae64-a654-41ae-bcc8-6e10dd5fdaad (type=DISCONNECTION_REQUEST, length=596
bytes) from host:port in 10 millis
2016-09-14 06:24:34,718 INFO [Disconnect from Cluster]
o.a.nifi.controller.StandardFlowService Received disconnection request message
from manager with explanation: User anonymous requested that node be
disconnected from cluster
2016-09-14 06:24:34,718 INFO [Disconnect from Cluster]
o.a.nifi.controller.StandardFlowService Disconnecting node.
2016-09-14 06:24:34,719 INFO [Disconnect from Cluster]
o.apache.nifi.controller.FlowController Cluster State changed from Clustered to
Not Clustered
2016-09-14 06:24:34,725 INFO [Disconnect from Cluster]
o.a.n.c.l.e.CuratorLeaderElectionManager This node is no longer registered to
be elected as the Leader for Role 'Primary Node'
2016-09-14 06:24:34,732 INFO [Disconnect from Cluster]
o.a.n.c.l.e.CuratorLeaderElectionManager This node is no longer registered to
be elected as the Leader for Role 'Cluster Coordinator'
2016-09-14 06:24:34,733 INFO [Disconnect from Cluster]
o.a.nifi.controller.StandardFlowService Node disconnected.
2016-09-14 06:24:34,747 ERROR [Leader Election Notification Thread-2]
o.a.c.f.recipes.leader.LeaderSelector The leader threw an exception
java.lang.IllegalMonitorStateException: You do not own the lock:
/nifi/leaders/Cluster Coordinator
at
org.apache.curator.framework.recipes.locks.InterProcessMutex.release(InterProcessMutex.java:140)
~[curator-recipes-2.11.0.jar:na]
at
org.apache.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:425)
[curator-recipes-2.11.0.jar:na]
at
org.apache.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:441)
[curator-recipes-2.11.0.jar:na]
at
org.apache.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:64)
[curator-recipes-2.11.0.jar:na]
at
org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:245)
[curator-recipes-2.11.0.jar:na]
at
org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:239)
[curator-recipes-2.11.0.jar:na]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[na:1.8.0_101]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[na:1.8.0_101]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[na:1.8.0_101]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[na:1.8.0_101]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[na:1.8.0_101]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[na:1.8.0_101]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[na:1.8.0_101]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_101]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_101]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
2016-09-14 06:24:34,747 ERROR [Leader Election Notification Thread-1]
o.a.c.f.recipes.leader.LeaderSelector The leader threw an exception
java.lang.IllegalMonitorStateException: You do not own the lock:
/nifi/leaders/Primary Node
at
org.apache.curator.framework.recipes.locks.InterProcessMutex.release(InterProcessMutex.java:140)
~[curator-recipes-2.11.0.jar:na]
at
org.apache.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:425)
[curator-recipes-2.11.0.jar:na]
at
org.apache.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:441)
[curator-recipes-2.11.0.jar:na]
at
org.apache.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:64)
[curator-recipes-2.11.0.jar:na]
at
org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:245)
[curator-recipes-2.11.0.jar:na]
at
org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:239)
[curator-recipes-2.11.0.jar:na]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[na:1.8.0_101]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[na:1.8.0_101]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[na:1.8.0_101]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
[na:1.8.0_101]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
[na:1.8.0_101]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
[na:1.8.0_101]
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
[na:1.8.0_101]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_101]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_101]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
2016-09-14 06:24:34,790 INFO [Flow Service Tasks Thread-2]
o.a.nifi.controller.StandardFlowService Saved flow controller
org.apache.nifi.controller.FlowController@dd83cf0 // Another save pending =
false
2016-09-14 06:24:34,854 INFO [NiFi Web Server-20]
o.a.n.controller.StandardProcessorNode Stopping processor: class
org.apache.nifi.processors.elasticsearch.PutElasticsearch
2016-09-14 06:24:34,948 INFO [Process Cluster Protocol Request-7]
o.a.n.c.c.node.NodeClusterCoordinator Status of host:port changed from
NodeConnectionStatus[nodeId=host:port, state=CONNECTED, updateId=22] to
NodeConnectionStatus[nodeId=host:port, state=CONNECTING, updateId=26]
2016-09-14 06:24:34,959 INFO [Process Cluster Protocol Request-7]
o.a.c.f.imps.CuratorFrameworkImpl Starting
2016-09-14 06:24:34,976 INFO [Process Cluster Protocol Request-7-EventThread]
o.a.c.f.state.ConnectionStateManager State change: CONNECTED
2016-09-14 06:24:35,002 INFO [Curator-Framework-0]
o.a.c.f.imps.CuratorFrameworkImpl backgroundOperationsLoop exiting
2016-09-14 06:24:35,006 INFO [Process Cluster Protocol Request-8]
o.a.n.c.p.impl.SocketProtocolListener Finished processing request
752f264c-10d4-4055-83f4-238d2c661923 (type=RECONNECTION_REQUEST, length=12285
bytes) from host:port in 37 millis
2016-09-14 06:24:35,006 INFO [Reconnect to Cluster]
o.a.nifi.controller.StandardFlowService Processing reconnection request from
manager.
2016-09-14 06:24:35,006 INFO [Reconnect to Cluster]
o.a.n.c.c.node.NodeClusterCoordinator Resetting cluster node statuses from
{host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTED,
updateId=21], host:port=NodeConnectionStatus[nodeId=host:port,
state=CONNECTING, updateId=26],
host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTED, updateId=23],
host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTED, updateId=24]}
to {host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTED,
updateId=21], host:port=NodeConnectionStatus[nodeId=host:port,
state=CONNECTING, updateId=26],
host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTED, updateId=23],
host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTED, updateId=24]}
2016-09-14 06:24:35,021 INFO [Process Cluster Protocol Request-7]
o.a.n.c.p.impl.SocketProtocolListener Finished processing request
5e079f32-6280-437d-b254-9f5abb4388b0 (type=NODE_STATUS_CHANGE, length=1052
bytes) from nat-r6-pvzu-nifi-5.openstacklocal in 75 millis
2016-09-14 06:24:35,146 WARN [Reconnect to Cluster]
o.a.n.controller.StandardProcessorNode Can not start 'PutElasticsearch' since
it's already in the process of being started or it is DISABLED - STOPPING
2016-09-14 06:24:35,147 WARN [Reconnect to Cluster]
o.a.n.p.elasticsearch.PutElasticsearch
PutElasticsearch[id=275e4abc-0157-1000-0000-00005740dd0c] Can not start
'PutElasticsearch' since it's already in the process of being started or it is
DISABLED - STOPPING
2016-09-14 06:24:35,177 INFO [Reconnect to Cluster]
o.apache.nifi.controller.FlowController Successfully synchronized controller
with proposed flow
2016-09-14 06:24:35,193 INFO [Reconnect to Cluster]
o.a.nifi.controller.StandardFlowService Setting Flow Controller's Node ID:
host:port
2016-09-14 06:24:35,197 INFO [Reconnect to Cluster]
o.a.n.c.c.node.NodeClusterCoordinator This node is now connected to the
cluster. Will no longer require election of DataFlow.
2016-09-14 06:24:35,202 INFO [Reconnect to Cluster]
o.apache.nifi.controller.FlowController Cluster State changed from Not
Clustered to Clustered
2016-09-14 06:24:35,203 INFO [Reconnect to Cluster]
o.a.n.c.l.e.CuratorLeaderElectionManager
CuratorLeaderElectionManager[stopped=false] Registered new Leader Selector for
role Primary Node; this node is an active participant in the election.
2016-09-14 06:24:35,206 INFO [Reconnect to Cluster]
o.a.n.c.l.e.CuratorLeaderElectionManager
CuratorLeaderElectionManager[stopped=false] Registered new Leader Selector for
role Cluster Coordinator; this node is an active participant in the election.
2016-09-14 06:24:35,206 INFO [Reconnect to Cluster]
o.apache.nifi.controller.FlowController Starting 0 processors/ports/funnels
2016-09-14 06:24:35,206 INFO [Reconnect to Cluster]
o.apache.nifi.controller.FlowController Started 0 Remote Group Ports
transmitting
2016-09-14 06:24:35,207 INFO [Reconnect to Cluster]
o.a.n.c.c.node.NodeClusterCoordinator Resetting cluster node statuses from
{host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTED,
updateId=21], host:port=NodeConnectionStatus[nodeId=host:port,
state=CONNECTING, updateId=26],
host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTED, updateId=23],
host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTED, updateId=24]}
to {host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTED,
updateId=21], host:port=NodeConnectionStatus[nodeId=host:port,
state=CONNECTING, updateId=26],
host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTED, updateId=23],
host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTED, updateId=24]}
2016-09-14 06:24:35,207 INFO [Reconnect to Cluster]
o.a.nifi.controller.StandardFlowService Node reconnected.
2016-09-14 06:24:35,225 INFO [Clustering Tasks Thread-2]
o.a.n.c.c.ClusterProtocolHeartbeater Heartbeat created at 2016-09-14
06:24:35,210 and sent to host:port at 2016-09-14 06:24:35,225; send took 15
millis
2016-09-14 06:24:35,235 INFO [Clustering Tasks Thread-1]
o.a.n.c.c.ClusterProtocolHeartbeater Heartbeat created at 2016-09-14
06:24:35,212 and sent to host:port at 2016-09-14 06:24:35,235; send took 9
millis
{code}
> When node connects back the to the cluster intermittently processor does not
> return to the same state as its in the cluster
> ---------------------------------------------------------------------------------------------------------------------------
>
> Key: NIFI-2776
> URL: https://issues.apache.org/jira/browse/NIFI-2776
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework
> Affects Versions: 1.1.0
> Reporter: Arpit Gupta
> Fix For: 1.1.0
>
>
> Here is the scenario
> 1. Create a flow and start a processor
> 2. Disconnect a node
> 3. On the disconnected node stop the above processor
> 4. Connect the above node to the cluster
> 5. Wait 30s.
> 6. Check if the processor started on the node that was connected in #4.
> Very intermittently we see that the processor does not get into running state.
> When we query the processor status on the node we get the following bulletin
> {code}
> "bulletins": [{
> "id": 0,
> "groupId": "275e45f8-0157-1000-ffff-fffff191c079",
> "sourceId": "275e4abc-0157-1000-0000-00005740dd0c",
> "timestamp": "06:24:35 UTC",
> "nodeAddress": "host:port",
> "canRead": true,
> "bulletin": {
> "id": 0,
> "nodeAddress": "host:port",
> "category": "Log Message",
> "groupId": "275e45f8-0157-1000-ffff-fffff191c079",
> "sourceId": "275e4abc-0157-1000-0000-00005740dd0c",
> "sourceName": "putES",
> "level": "WARNING",
> "message":
> "PutElasticsearch[id=275e4abc-0157-1000-0000-00005740dd0c] Can not start
> 'PutElasticsearch' since it's already in the process of being started or it
> is DISABLED - STOPPING",
> "timestamp": "06:24:35 UTC"
> }
> }],
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)