lorcanj commented on PR #19670: URL: https://github.com/apache/kafka/pull/19670#issuecomment-2869110421
The assignment logic for active tasks now primarily uses the number of input partitions as a proxy for individual task weight. The trigger for rebalancing active tasks considers the weight of the task to be added to the current weight of the client. A average weight buffer has been introduced to make the system less aggressive in breaking stickiness due to minor weight imbalances, aiming to reduce unnecessary task movements while still correcting significant imbalances. Standby task assignment continues to use the traditional task count-based logic. This was done due to a lack of understanding as to whether the input partitions should be considered for these assignments. This divergence in logic for active vs. standby tasks has introduced some awkwardness in the codebase, particularly around function signatures (e.g., findBestClientForTask needing different evaluation criteria). The findLeastLoadedClient method has been refactored to remove an unnecessary loop, improving its efficiency by computing the required information within the initial loop using additional variables. Some issues I’m aware of: • Standby Assignment Strategy: The current change makes standby client selection use the current task count-based metrics. Is this approach acceptable (which has resulted in the current awkward approach of passing in separate functions) or should it use the input partition based approach, or something else? • Unassigned tasks sorting: Should the sorting of remaining unassigned active tasks be changed from TaskId to descending weight to potentially further improve weight balance? • Should some input partitions be excluded from the calculation of a task’s weight? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org