shibd commented on code in PR #23583: URL: https://github.com/apache/pulsar/pull/23583#discussion_r1843778405
########## pip/pip-392.md: ########## @@ -0,0 +1,97 @@ +# PIP-392: Add configuration to enable consistent hashing to select active consumer for partitioned topic + +# Background knowledge + +After [#19502](https://github.com/apache/pulsar/pull/19502) will use consistent hashing to select active consumer for non-partitioned topic + +# Motivation + +Currently, for partitioned topics, the active consumer is selected using the formula [partitionedIndex % consumerSize](https://github.com/apache/pulsar/blob/137df29f85798b00de75460a1acb91c7bc25453f/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/AbstractDispatcherSingleActiveConsumer.java#L129-L130). +This method can lead to uneven distribution of active consumers. + +Consider a scenario with 100 topics named `public/default/topic-{0~100}`, each having `one partition`. +If 10 consumers are created using a `regex` subscription with the `Failover type`, all topic will be assigned to the same consumer(the first connected consumer). This results in an imbalanced distribution of consumers. + +# Goals + +## In Scope +- Address the issue of imbalance for `failover` subscription type consumers in single-partition or few-partition topics. + +## Out of Scope +- Excluding the `exclusive` subscription type. + +It's important to note that both the `modulo algorithm` and the `consistent hashing algorithm` can cause the consumer to be transferred. +This might result in messages being delivered multiple times to consumers, which is a known issue and has been mentioned in the documentation. +https://pulsar.apache.org/docs/4.0.x/concepts-messaging/#failover + +# High Level Design +The solution involves adding a configuration setting that allows users to enable consistent hashing for partitioned topics. +When enabled, the consumer selection process will use consistent hashing instead of the modulo operation. + +The algorithm already exists through [#19502](https://github.com/apache/pulsar/pull/19502) + +In simple terms, the hash algorithm includes the following steps: + +1. Hash Ring Creation: Traverse all consumers and use `consumer name` to calculate a hash ring with 100 virtual nodes. Review Comment: When building the hash ring, we include the index, so even if there are same consumer name for a partitioned topic, they will be on different hash segments. https://github.com/apache/pulsar/blob/1b1bd4b610dd768a6908964ef841a6790bb0f4f0/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/AbstractDispatcherSingleActiveConsumer.java#L156 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
