[
https://issues.apache.org/jira/browse/NIFI-14236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17925669#comment-17925669
]
ASF subversion and git services commented on NIFI-14236:
--------------------------------------------------------
Commit 9b84dd475ee6561b2fdf2b0c95c2c182ed3f0fb9 in nifi's branch
refs/heads/main from David Handermann
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=9b84dd475e ]
NIFI-14236 Implement consistent hash for Attribute Partitioner (#9709)
* NIFI-14236 Implemented consistent hash for Attribute Partitioner
- Used implementation from com.google.common.hash.Hashing.consistentHash method
> Load Balance using Partition by Attribute results in poor distribution
> ----------------------------------------------------------------------
>
> Key: NIFI-14236
> URL: https://issues.apache.org/jira/browse/NIFI-14236
> Project: Apache NiFi
> Issue Type: Bug
> Components: Core Framework
> Affects Versions: 1.16.0, 2.2.0
> Reporter: Mark Payne
> Assignee: David Handermann
> Priority: Major
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> NIFI-9638 aimed to eliminate unnecessary dependencies on the Google Guava
> library. However, in doing so, it changed the hashing algorithm from
> Consistent Hashing to a much simpler hashing algorithm that results in poor
> distribution of data.
> Consistent Hashing was specifically chosen for this use case because it
> provides excellent distribution of data across a given spectrum of bins, and
> also is designed such that if the number of bins (in this case number of NiFi
> nodes) changes, the number of elements that need to be re-distributed (in
> this case the number of FlowFiles) is minimal.
> We need to revert back to Consistent Hashing in order to provide better
> distribution of data across the cluster.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)