squah-confluent commented on PR #20000: URL: https://github.com/apache/kafka/pull/20000#issuecomment-3707344977
@FrankYang0529 Thanks for collecting the new benchmark results. The non-rack aware numbers look good. The rack aware number are better but still a little slow. It's not ideal to be blocking the group coordinator thread for 150 ms. Maybe this won't be too bad in practice, since 1. most groups won't be as large 2. I'm working on a KIP to reduce the impact of slow assignors If we really want to, I think it's possible to improve performance further by re-designing the `SubscribedTopicDescriber.racksForPartition` interface, but maybe it's best left to a separate PR. `jmh-benchmarks/README.md` has instructions for running the benchmarks with libasyncProfiler which will generate a flame graph of the assignor run. Separately I have some concerns about stickiness when static members are replaced. The group coordinator assigns the new static member a new member id and keeps the previous assignment, so the order of member ids is not stable (I'm aware the existing range assignors also have this problem). How expensive would it be to track the previous owner of partitions in `maybeRevokePartitions` and maybe add a new pass in between `assignRackAwarenessRemainingPartitions` and `assignRemainingPartitions` to restore those partitions to their preferred sticky members? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
