[ 
https://issues.apache.org/jira/browse/NIFI-5663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16639938#comment-16639938
 ] 

ASF GitHub Bot commented on NIFI-5663:
--------------------------------------

GitHub user markap14 opened a pull request:

    https://github.com/apache/nifi/pull/3047

    NIFI-5663: Ensure that when sort Node Identifiers that we use both the 
node's API Address as well as API Port, in case 2 nodes are running on same 
host. Also ensure that when Local Node ID is determined that we update all Load 
Balancing Partitions, if necessary

    Thank you for submitting a contribution to Apache NiFi.
    
    In order to streamline the review of the contribution we ask you
    to ensure the following steps have been taken:
    
    ### For all changes:
    - [ ] Is there a JIRA ticket associated with this PR? Is it referenced 
         in the commit message?
    
    - [ ] Does your PR title start with NIFI-XXXX where XXXX is the JIRA number 
you are trying to resolve? Pay particular attention to the hyphen "-" character.
    
    - [ ] Has your PR been rebased against the latest commit within the target 
branch (typically master)?
    
    - [ ] Is your initial contribution a single, squashed commit?
    
    ### For code changes:
    - [ ] Have you ensured that the full suite of tests is executed via mvn 
-Pcontrib-check clean install at the root nifi folder?
    - [ ] Have you written or updated unit tests to verify your changes?
    - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)? 
    - [ ] If applicable, have you updated the LICENSE file, including the main 
LICENSE file under nifi-assembly?
    - [ ] If applicable, have you updated the NOTICE file, including the main 
NOTICE file found under nifi-assembly?
    - [ ] If adding new Properties, have you added .displayName in addition to 
.name (programmatic access) for each of the new properties?
    
    ### For documentation related changes:
    - [ ] Have you ensured that format looks appropriate for the output in 
which it is rendered?
    
    ### Note:
    Please ensure that once the PR is submitted, you check travis-ci for build 
issues and submit an update to your PR as soon as possible.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/markap14/nifi NIFI-5663

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/nifi/pull/3047.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3047
    
----
commit 619f1ffe8fbbca61bc5545f13920190a77006e08
Author: Mark Payne <markap14@...>
Date:   2018-06-14T15:57:21Z

    NIFI-5516: Implement Load-Balanced Connections
    Refactoring StandardFlowFileQueue to have an AbstractFlowFileQueue
    Refactored more into AbstractFlowFileQueue
    Added documentation, cleaned up code some
    Refactored FlowFileQueue so that there is SwappablePriorityQueue
    Several unit tests written
    Added REST API Endpoint to allow PUT to update connection to use load 
balancing or not. When enabling load balancing, though, I saw the queue size go 
from 9 to 18. Then was only able to process 9 FlowFiles.
    Bug fixes
    Code refactoring
    Added integration tests, bug fixes
    Refactored clients to use NIO
    Bug fixes. Appears to finally be working with NIO Client!!!!!
    NIFI-5516: Refactored some code from NioAsyncLoadBalanceClient to 
LoadBalanceSession
    Bug fixes and allowed load balancing socket connections to be reused
    Implemented ability to compress Nothing, Attributes, or Content + 
Attributes when performing load-balancing
    Added flag to ConnectionDTO to indicate Load Balance Status
    Updated Diagnostics DTO for connections
    Store state about cluster topology in NodeClusterCoordinator so that the 
state is known upon restart
    Code cleanup
    Fixed checkstyle and unit tests
    NIFI-5516: Updating logic for Cluster Node Firewall so that the node's 
identity comes from its certificate, not from whatever it says it is.
    NIFI-5516: FIxed missing License headers
    NIFI-5516: Some minor code cleanup
    NIFI-5516: Adddressed review feedback; Bug fixes; some code cleanup. 
Changed dependency on nifi-registry from SNAPSHOT to official 0.3.0 release
    NIFI-5516: Take backpressure configuration into account
    NIFI-5516: Fixed ConnectionDiagnosticsSnapshot to include node identifier
    NIFI-5516: Addressed review feedback
    
    This closes #2947

commit 47bbe20f9ee3c348378adfa86965f34a319057bb
Author: Mark Payne <markap14@...>
Date:   2018-10-05T15:12:49Z

    NIFI-5663: Ensure that when sort Node Identifiers that we use both the 
node's API Address as well as API Port, in case 2 nodes are running on same 
host. Also ensure that when Local Node ID is determined that we update all Load 
Balancing Partitions, if necessary

----


> FlowFile load balancing keeps re-partitioning
> ---------------------------------------------
>
>                 Key: NIFI-5663
>                 URL: https://issues.apache.org/jira/browse/NIFI-5663
>             Project: Apache NiFi
>          Issue Type: Bug
>          Components: Core Framework
>    Affects Versions: 1.8.0
>            Reporter: Koji Kawamura
>            Assignee: Mark Payne
>            Priority: Critical
>
> Scenario
>  # Start a brand-new cluster with only 1 node (nifi0). Using existing 
> multi-node clusters don't reproduce the issue.
>  # Create GenerateFlowFile -> LogAttribute
>  # Then set 'Partition by attribute' LB strategy at the connection
>  # Add 2nd node, nifi1
>  # Generate some FlowFiles. Then load-balance activity never finishes.
> With a 2-node cluster, for some reason, each node ended up having different 
> queuePartitions order at SocketLoadBalancedFlowFileQueue. By adding debug 
> logs, I found each node has followings:
>  * nifi0
>  ** queuePartitions[0} = 
> RemoteQueuePartition[queueId=14ac9634-0166-1000-ffff-ffffd9ae7f4b, 
> nodeId=nifi1.example.com:8080]
>  ** queuePartitions[1} = 
> SwappablePriorityQueueLocalPartition[queueId=14ac9634-0166-1000-ffff-ffffd9ae7f4b]
>  * nifi1
>  ** queuePartitions[0} = 
> RemoteQueuePartition[queueId=14ac9634-0166-1000-ffff-ffffd9ae7f4b, 
> nodeId=nifi0.example.com:8080]
>  ** queuePartitions[1} = 
> SwappablePriorityQueueLocalPartition[queueId=14ac9634-0166-1000-ffff-ffffd9ae7f4b]
> Because of this, 'Partition by attribute' LB strategy keeps re-partitioning 
> received FlowFiles between each other in case the calculated attribute value 
> hash points to queuePartitions[0]. Following log is written endlessly:
> {code:java}
> 2018-10-05 07:09:32,372 DEBUG [Load Balance Server Thread-3] 
> o.a.n.c.q.c.SocketLoadBalancedFlowFileQueue Received the following FlowFiles 
> from Peer: ...offset=7452, 
> length=180],offset=162,name=10653317458635,size=18]]. Will re-partition 
> FlowFiles to ensure proper balancing across the cluster.
> {code}
> SocketLoadBalancedFlowFileQueue maintains queuePartitions by listening to 
> cluster topology change using ClusterTopologyEventListener. 
> SocketLoadBalancedFlowFileQueueClusterEventListener.onNodeAdded debug log 
> shows the array was empty when the 2nd node (nifi1) is added:
> {code:java}
> ClusterEventListener.onNodeAdded. 2018-10-05 07:06:42,883 DEBUG [Process 
> Cluster Protocol Request-10] o.a.n.c.q.c.SocketLoadBalancedFlowFileQueue Node 
> Identifier nifi1.example.com:8080 added to cluster. Node ID's changing from 
> [] to [nifi1.example.com:8080]
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to