[jira] [Commented] (NIFI-2776) When node connects back the to the cluster intermittently processor does not return to the same state as its in the cluster

Arpit Gupta (JIRA) Wed, 14 Sep 2016 11:45:22 -0700

    [ 
https://issues.apache.org/jira/browse/NIFI-2776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15491154#comment-15491154
 ]


Arpit Gupta commented on NIFI-2776:
-----------------------------------

Around this time we see the following in the nifi-app.log

{code}
2016-09-14 06:24:34,346 INFO [StandardProcessScheduler Thread-1] 
o.a.n.c.s.TimerDrivenSchedulingAgent Scheduled 
GetFile[id=275e48e3-0157-1000-0000-00000ba0cd25] to run with 1 threads
2016-09-14 06:24:34,559 INFO [StandardProcessScheduler Thread-2] 
org.elasticsearch.plugins [Titania] loaded [], sites []
2016-09-14 06:24:34,714 INFO [Process Cluster Protocol Request-6] 
o.a.n.c.p.impl.SocketProtocolListener Finished processing request 
1858ae64-a654-41ae-bcc8-6e10dd5fdaad (type=DISCONNECTION_REQUEST, length=596 
bytes) from host:port in 10 millis
2016-09-14 06:24:34,718 INFO [Disconnect from Cluster] 
o.a.nifi.controller.StandardFlowService Received disconnection request message 
from manager with explanation: User anonymous requested that node be 
disconnected from cluster
2016-09-14 06:24:34,718 INFO [Disconnect from Cluster] 
o.a.nifi.controller.StandardFlowService Disconnecting node.
2016-09-14 06:24:34,719 INFO [Disconnect from Cluster] 
o.apache.nifi.controller.FlowController Cluster State changed from Clustered to 
Not Clustered
2016-09-14 06:24:34,725 INFO [Disconnect from Cluster] 
o.a.n.c.l.e.CuratorLeaderElectionManager This node is no longer registered to 
be elected as the Leader for Role 'Primary Node'
2016-09-14 06:24:34,732 INFO [Disconnect from Cluster] 
o.a.n.c.l.e.CuratorLeaderElectionManager This node is no longer registered to 
be elected as the Leader for Role 'Cluster Coordinator'
2016-09-14 06:24:34,733 INFO [Disconnect from Cluster] 
o.a.nifi.controller.StandardFlowService Node disconnected.
2016-09-14 06:24:34,747 ERROR [Leader Election Notification Thread-2] 
o.a.c.f.recipes.leader.LeaderSelector The leader threw an exception
java.lang.IllegalMonitorStateException: You do not own the lock: 
/nifi/leaders/Cluster Coordinator
        at 
org.apache.curator.framework.recipes.locks.InterProcessMutex.release(InterProcessMutex.java:140)
 ~[curator-recipes-2.11.0.jar:na]
        at 
org.apache.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:425)
 [curator-recipes-2.11.0.jar:na]
        at 
org.apache.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:441)
 [curator-recipes-2.11.0.jar:na]
        at 
org.apache.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:64)
 [curator-recipes-2.11.0.jar:na]
        at 
org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:245)
 [curator-recipes-2.11.0.jar:na]
        at 
org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:239)
 [curator-recipes-2.11.0.jar:na]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[na:1.8.0_101]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[na:1.8.0_101]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[na:1.8.0_101]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[na:1.8.0_101]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[na:1.8.0_101]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
 [na:1.8.0_101]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 [na:1.8.0_101]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_101]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_101]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
2016-09-14 06:24:34,747 ERROR [Leader Election Notification Thread-1] 
o.a.c.f.recipes.leader.LeaderSelector The leader threw an exception
java.lang.IllegalMonitorStateException: You do not own the lock: 
/nifi/leaders/Primary Node
        at 
org.apache.curator.framework.recipes.locks.InterProcessMutex.release(InterProcessMutex.java:140)
 ~[curator-recipes-2.11.0.jar:na]
        at 
org.apache.curator.framework.recipes.leader.LeaderSelector.doWork(LeaderSelector.java:425)
 [curator-recipes-2.11.0.jar:na]
        at 
org.apache.curator.framework.recipes.leader.LeaderSelector.doWorkLoop(LeaderSelector.java:441)
 [curator-recipes-2.11.0.jar:na]
        at 
org.apache.curator.framework.recipes.leader.LeaderSelector.access$100(LeaderSelector.java:64)
 [curator-recipes-2.11.0.jar:na]
        at 
org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:245)
 [curator-recipes-2.11.0.jar:na]
        at 
org.apache.curator.framework.recipes.leader.LeaderSelector$2.call(LeaderSelector.java:239)
 [curator-recipes-2.11.0.jar:na]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[na:1.8.0_101]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[na:1.8.0_101]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[na:1.8.0_101]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
[na:1.8.0_101]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
[na:1.8.0_101]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
 [na:1.8.0_101]
        at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
 [na:1.8.0_101]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_101]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_101]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
2016-09-14 06:24:34,790 INFO [Flow Service Tasks Thread-2] 
o.a.nifi.controller.StandardFlowService Saved flow controller 
org.apache.nifi.controller.FlowController@dd83cf0 // Another save pending = 
false
2016-09-14 06:24:34,854 INFO [NiFi Web Server-20] 
o.a.n.controller.StandardProcessorNode Stopping processor: class 
org.apache.nifi.processors.elasticsearch.PutElasticsearch
2016-09-14 06:24:34,948 INFO [Process Cluster Protocol Request-7] 
o.a.n.c.c.node.NodeClusterCoordinator Status of host:port changed from 
NodeConnectionStatus[nodeId=host:port, state=CONNECTED, updateId=22] to 
NodeConnectionStatus[nodeId=host:port, state=CONNECTING, updateId=26]
2016-09-14 06:24:34,959 INFO [Process Cluster Protocol Request-7] 
o.a.c.f.imps.CuratorFrameworkImpl Starting
2016-09-14 06:24:34,976 INFO [Process Cluster Protocol Request-7-EventThread] 
o.a.c.f.state.ConnectionStateManager State change: CONNECTED
2016-09-14 06:24:35,002 INFO [Curator-Framework-0] 
o.a.c.f.imps.CuratorFrameworkImpl backgroundOperationsLoop exiting
2016-09-14 06:24:35,006 INFO [Process Cluster Protocol Request-8] 
o.a.n.c.p.impl.SocketProtocolListener Finished processing request 
752f264c-10d4-4055-83f4-238d2c661923 (type=RECONNECTION_REQUEST, length=12285 
bytes) from host:port in 37 millis
2016-09-14 06:24:35,006 INFO [Reconnect to Cluster] 
o.a.nifi.controller.StandardFlowService Processing reconnection request from 
manager.
2016-09-14 06:24:35,006 INFO [Reconnect to Cluster] 
o.a.n.c.c.node.NodeClusterCoordinator Resetting cluster node statuses from 
{host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTED, 
updateId=21], host:port=NodeConnectionStatus[nodeId=host:port, 
state=CONNECTING, updateId=26], 
host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTED, updateId=23], 
host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTED, updateId=24]} 
to {host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTED, 
updateId=21], host:port=NodeConnectionStatus[nodeId=host:port, 
state=CONNECTING, updateId=26], 
host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTED, updateId=23], 
host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTED, updateId=24]}
2016-09-14 06:24:35,021 INFO [Process Cluster Protocol Request-7] 
o.a.n.c.p.impl.SocketProtocolListener Finished processing request 
5e079f32-6280-437d-b254-9f5abb4388b0 (type=NODE_STATUS_CHANGE, length=1052 
bytes) from nat-r6-pvzu-nifi-5.openstacklocal in 75 millis
2016-09-14 06:24:35,146 WARN [Reconnect to Cluster] 
o.a.n.controller.StandardProcessorNode Can not start 'PutElasticsearch' since 
it's already in the process of being started or it is DISABLED - STOPPING
2016-09-14 06:24:35,147 WARN [Reconnect to Cluster] 
o.a.n.p.elasticsearch.PutElasticsearch 
PutElasticsearch[id=275e4abc-0157-1000-0000-00005740dd0c] Can not start 
'PutElasticsearch' since it's already in the process of being started or it is 
DISABLED - STOPPING
2016-09-14 06:24:35,177 INFO [Reconnect to Cluster] 
o.apache.nifi.controller.FlowController Successfully synchronized controller 
with proposed flow
2016-09-14 06:24:35,193 INFO [Reconnect to Cluster] 
o.a.nifi.controller.StandardFlowService Setting Flow Controller's Node ID: 
host:port
2016-09-14 06:24:35,197 INFO [Reconnect to Cluster] 
o.a.n.c.c.node.NodeClusterCoordinator This node is now connected to the 
cluster. Will no longer require election of DataFlow.
2016-09-14 06:24:35,202 INFO [Reconnect to Cluster] 
o.apache.nifi.controller.FlowController Cluster State changed from Not 
Clustered to Clustered
2016-09-14 06:24:35,203 INFO [Reconnect to Cluster] 
o.a.n.c.l.e.CuratorLeaderElectionManager 
CuratorLeaderElectionManager[stopped=false] Registered new Leader Selector for 
role Primary Node; this node is an active participant in the election.
2016-09-14 06:24:35,206 INFO [Reconnect to Cluster] 
o.a.n.c.l.e.CuratorLeaderElectionManager 
CuratorLeaderElectionManager[stopped=false] Registered new Leader Selector for 
role Cluster Coordinator; this node is an active participant in the election.
2016-09-14 06:24:35,206 INFO [Reconnect to Cluster] 
o.apache.nifi.controller.FlowController Starting 0 processors/ports/funnels
2016-09-14 06:24:35,206 INFO [Reconnect to Cluster] 
o.apache.nifi.controller.FlowController Started 0 Remote Group Ports 
transmitting
2016-09-14 06:24:35,207 INFO [Reconnect to Cluster] 
o.a.n.c.c.node.NodeClusterCoordinator Resetting cluster node statuses from 
{host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTED, 
updateId=21], host:port=NodeConnectionStatus[nodeId=host:port, 
state=CONNECTING, updateId=26], 
host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTED, updateId=23], 
host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTED, updateId=24]} 
to {host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTED, 
updateId=21], host:port=NodeConnectionStatus[nodeId=host:port, 
state=CONNECTING, updateId=26], 
host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTED, updateId=23], 
host:port=NodeConnectionStatus[nodeId=host:port, state=CONNECTED, updateId=24]}
2016-09-14 06:24:35,207 INFO [Reconnect to Cluster] 
o.a.nifi.controller.StandardFlowService Node reconnected.
2016-09-14 06:24:35,225 INFO [Clustering Tasks Thread-2] 
o.a.n.c.c.ClusterProtocolHeartbeater Heartbeat created at 2016-09-14 
06:24:35,210 and sent to host:port at 2016-09-14 06:24:35,225; send took 15 
millis
2016-09-14 06:24:35,235 INFO [Clustering Tasks Thread-1] 
o.a.n.c.c.ClusterProtocolHeartbeater Heartbeat created at 2016-09-14 
06:24:35,212 and sent to host:port at 2016-09-14 06:24:35,235; send took 9 
millis
{code}

> When node connects back the to the cluster intermittently processor does not 
> return to the same state as its in the cluster
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: NIFI-2776
>                 URL: https://issues.apache.org/jira/browse/NIFI-2776
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.1.0
>            Reporter: Arpit Gupta
>             Fix For: 1.1.0
>
>
> Here is the scenario
> 1. Create a flow and start a processor
> 2. Disconnect a node
> 3. On the disconnected node stop the above processor
> 4. Connect the above node to the cluster
> 5. Wait 30s.
> 6. Check if the processor started on the node that was connected in #4.
> Very intermittently we see that the processor does not get into running state.
> When we query the processor status on the node we get the following bulletin
> {code}
> "bulletins": [{
>               "id": 0,
>               "groupId": "275e45f8-0157-1000-ffff-fffff191c079",
>               "sourceId": "275e4abc-0157-1000-0000-00005740dd0c",
>               "timestamp": "06:24:35 UTC",
>               "nodeAddress": "host:port",
>               "canRead": true,
>               "bulletin": {
>                       "id": 0,
>                       "nodeAddress": "host:port",
>                       "category": "Log Message",
>                       "groupId": "275e45f8-0157-1000-ffff-fffff191c079",
>                       "sourceId": "275e4abc-0157-1000-0000-00005740dd0c",
>                       "sourceName": "putES",
>                       "level": "WARNING",
>                       "message": 
> "PutElasticsearch[id=275e4abc-0157-1000-0000-00005740dd0c] Can not start 
> 'PutElasticsearch' since it's already in the process of being started or it 
> is DISABLED - STOPPING",
>                       "timestamp": "06:24:35 UTC"
>               }
>       }],
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (NIFI-2776) When node connects back the to the cluster intermittently processor does not return to the same state as its in the cluster

Reply via email to