[
https://issues.apache.org/jira/browse/NIFI-6787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17026208#comment-17026208
]
ASF subversion and git services commented on NIFI-6787:
-------------------------------------------------------
Commit a78cecd95f1a690614c17f38aa003f3d78e468cb in nifi's branch
refs/heads/support/nifi-1.11.x from Joe Witt
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=a78cecd ]
Revert "NIFI-6787 - Before: When checking if a load balanced connection queue
is full, we compare the totalSize.get() and getMaxQueueSize()."
This reverts commit 773160958209a8a173b4f741517c4bda63a28e82.
> Load balanced connection can hold up for whole cluster when one node slows
> down
> -------------------------------------------------------------------------------
>
> Key: NIFI-6787
> URL: https://issues.apache.org/jira/browse/NIFI-6787
> Project: Apache NiFi
> Issue Type: Improvement
> Components: Core Framework
> Reporter: Tamas Palfy
> Assignee: Tamas Palfy
> Priority: Major
> Fix For: 1.11.0
>
> Attachments: template.change_hostname_in_groovy_script.xml
>
> Time Spent: 50m
> Remaining Estimate: 0h
>
> A slow processor on one node in a cluster can slow down the same processor on
> the other nodes when load balanced incoming connection is used:
> When a processors decides wether it can pass along a flowfile, it checks
> whether the outgoing connection is full or not (if full, it yields).
> Similarly when the receiving processor is to be scheduled the framework
> checks if the incoming connection is empty (if empty, no reason to call
> {{onTrigger}}).
> The problem is that when checking if it's full or not, it checks the _total_
> size (which is across the whole cluster) and compares it to the max (which is
> scoped to the current node).
> The empty check is (correctly) done on the local partition only.
> This can lead to the case where the slow node fills up its queue while the
> faster ones empty theirs.
> Once the slow node has a full queue, the fast ones stop receiving new input
> and thus stop working after their queues get emptied.
> The issue is probably the fact that
> {{SocketLoadBalancedFlowFileQueue.isFull}} actually calls to
> {{AbstractFlowFileQueue.isFull}} which checks {{size()}}, but that returns
> the _total_ size. (The empty check looks fine but for reference it is done
> via {{SocketLoadBalancedFlowFileQueue.isActiveQueueEmpty}}).
> {code:title=AbstractFlowFileQueue.java|borderStyle=solid}
> ...
> private MaxQueueSize getMaxQueueSize() {
> return maxQueueSize.get();
> }
> @Override
> public boolean isFull() {
> return isFull(size());
> }
> protected boolean isFull(final QueueSize queueSize) {
> final MaxQueueSize maxSize = getMaxQueueSize();
> // Check if max size is set
> if (maxSize.getMaxBytes() <= 0 && maxSize.getMaxCount() <= 0) {
> return false;
> }
> if (maxSize.getMaxCount() > 0 && queueSize.getObjectCount() >=
> maxSize.getMaxCount()) {
> return true;
> }
> if (maxSize.getMaxBytes() > 0 && queueSize.getByteCount() >=
> maxSize.getMaxBytes()) {
> return true;
> }
> return false;
> }
> ...
> {code}
> {code:title=SocketLoadBalancedFlowFileQueue.java|borderStyle=solid}
> ...
> @Override
> public QueueSize size() {
> return totalSize.get();
> }
> ...
> @Override
> public boolean isActiveQueueEmpty() {
> return localPartition.isActiveQueueEmpty();
> }
> ...
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)