[ 
https://issues.apache.org/jira/browse/NIFI-6517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16898295#comment-16898295
 ] 

ASF subversion and git services commented on NIFI-6517:
-------------------------------------------------------

Commit a9a4b765b179ceb573b87c17a8a1ab495b80e84a in nifi's branch 
refs/heads/master from Mark Payne
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=a9a4b76 ]

NIFI-6517: Ensure that if IOException is thrown from LoadBalanceSession that we 
properly catch the Exception, mark session as complete, and then re-throw. 
There was one condition in which this was not occurring. This commit addresses 
that situation.

This closes #3626.

Signed-off-by: Bryan Bende <[email protected]>


> Load Balanced Connections can show counts that are inaccurate, resulting in 
> data not moving through connection
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: NIFI-6517
>                 URL: https://issues.apache.org/jira/browse/NIFI-6517
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>            Reporter: Mark Payne
>            Assignee: Mark Payne
>            Priority: Critical
>             Fix For: 1.10.0
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I've encountered an issue where data that is load balanced using the Round 
> Robin strategy will show data in the queue but the data cannot be processed 
> by the follow-on processor. List Queue indicates no FlowFiles, and Empty 
> Queue indicates no Flow Files.
> Error in the logs indicates that there is a bug in maintaining the proper 
> size of the FlowFile Queue:
> {code:java}
> 2019-08-01 11:39:08,422 INFO [Heartbeat Monitor Thread-1] 
> o.a.n.c.c.h.AbstractHeartbeatMonitor Finished processing 2 heartbeats in 
> 32480 nanos
> 2019-08-01 11:39:08,422 INFO [Heartbeat Monitor Thread-1] 
> o.a.n.c.c.node.NodeClusterCoordinator localhost:8482 requested disconnection 
> from cluster due to Have not received a heartbeat from node in 40 seconds
> 2019-08-01 11:39:08,422 INFO [Heartbeat Monitor Thread-1] 
> o.a.n.c.c.node.NodeClusterCoordinator Status of localhost:8482 changed from 
> NodeConnectionStatus[nodeId=localhost:8482, state=CONNECTED, updateId=30] to 
> NodeConnectionStatus[nodeId=localhost:8482, state=DISCONNECTED, Disconnect 
> Code=Lack of Heartbeat, Disconnect Reason=Have not received a heartbeat from 
> node in 40 seconds, updateId=31]
> 2019-08-01 11:39:08,441 ERROR [Load-Balanced Client Thread-2] 
> o.a.n.c.queue.SwappablePriorityQueue Updated Size of Queue Unacknowledged 
> from FlowFile Queue Size[ ActiveQueue=[500, 2560000 Bytes], Swap Queue=[4845, 
> 24806400 Bytes], Swap Files=[0], Unacknowledged=[0, 0 Bytes] ] to FlowFile 
> Queue Size[ ActiveQueue=[500, 2560000 Bytes], Swap Queue=[4845, 24806400 
> Bytes], Swap Files=[0], Unacknowledged=[-945, -4838400 Bytes] ]
> java.lang.RuntimeException: Cannot create negative queue size
> at 
> org.apache.nifi.controller.queue.SwappablePriorityQueue.logIfNegative(SwappablePriorityQueue.java:945)
> at 
> org.apache.nifi.controller.queue.SwappablePriorityQueue.incrementUnacknowledgedQueueSize(SwappablePriorityQueue.java:935)
> at 
> org.apache.nifi.controller.queue.SwappablePriorityQueue.acknowledge(SwappablePriorityQueue.java:426)
> at 
> org.apache.nifi.controller.queue.clustered.partition.RemoteQueuePartition$1.onTransactionFailed(RemoteQueuePartition.java:160)
> at 
> org.apache.nifi.controller.queue.clustered.client.async.TransactionFailureCallback.onTransactionFailed(TransactionFailureCallback.java:26)
> at 
> org.apache.nifi.controller.queue.clustered.client.async.nio.NioAsyncLoadBalanceClient.nodeDisconnected(NioAsyncLoadBalanceClient.java:295)
> at 
> org.apache.nifi.controller.queue.clustered.client.async.nio.NioAsyncLoadBalanceClientTask.run(NioAsyncLoadBalanceClientTask.java:71)
> at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745){code}
> Note that this occurs immediately after the status of one of the other nodes 
> in the cluster changes to DISCONNECTED.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to