shibd commented on code in PR #23583:
URL: https://github.com/apache/pulsar/pull/23583#discussion_r1837428163


##########
pip/pip-392.md:
##########
@@ -0,0 +1,94 @@
+# PIP-392: Add configuration to enable consistent hashing to select active 
consumer for partitioned topic
+
+# Background knowledge
+
+After [#19502](https://github.com/apache/pulsar/pull/19502) will use 
consistent hashing to select active consumer for non-partitioned topic
+
+# Motivation
+
+Currently, for partitioned topics, the active consumer is selected using the 
formula [partitionedIndex % 
consumerSize](https://github.com/apache/pulsar/blob/137df29f85798b00de75460a1acb91c7bc25453f/pulsar-broker/src/main/java/org/apache/pulsar/broker/service/AbstractDispatcherSingleActiveConsumer.java#L129-L130).
 
+This method can lead to uneven distribution of active consumers.
+
+Consider a scenario with 100 topics named `public/default/topic-{0~100}`, each 
having `one partition`. 
+If 10 consumers are created using a `regex` subscription with the `Failover 
type`, all topic will be assigned to the same consumer(the first connected 
consumer). This results in an imbalanced distribution of consumers.
+
+# Goals
+
+## In Scope
+- Address the issue of imbalance for `failover` subscription type consumers in 
single-partition or few-partition topics.

Review Comment:
   hi, @lhotari Thanks for bringing up this topic. 
   
   Here's my conclusion: **Pulsar current cannot avoid duplicate messages on 
the client side during broker or active consumer transfer scenarios, and this 
PIP doesn't introduce new issues because both consistent hashing and modulo 
algorithms will cause active consumer transfer.**
   
   Please help review my explanation.
   
   
   ### For broker ownership transfer case: 
https://github.com/apache/pulsar/issues/19864
   This is a known case, so let's not discuss it too much here.
   
   
   ### For active consumer transfer case(Failover subscription)
   
   #### 1. Use consistent hashing algorithm
   
   Before this PIP, we only used the consistent hashing algorithm with 
`non-partition` topic.
   
   Since the consistent hashing algorithm creates a hash ring based on the 
consumer's name, when a new consumer joins, new nodes are added to the hash 
ring, causing topics to move to this consumer. Thus, topics are transferred.
   
   Refer to the image: The `topic-3` transfers from `c2` to `c4`.
   
   <img width="1677" alt="image" 
src="https://github.com/user-attachments/assets/07f50743-f573-49e3-8e4a-f0570247b485";>
   
   This issue exists with non-partitioned topics because they use consistent 
hashing. I agree with your point that after this PIP, it will also occur with 
partitioned topics.
   
   However, I want to emphasize that this issue can occur even without 
consistent hashing for `partitioned topic`. Please see the explanation below.
   
   
   #### 2. Use mod(partitionedIndex % consumerSize)   - **current partitioned 
topic algorithm**
   
   Assume a topic has 2 partition: **partitionedIndex [P0, P1]**
   
   1. First consumer added(**consumer-0**)
     - **P0** belong **consumer-0**: (paritonIndex:0)%(consumerSize:1)=0, 
     - **P1** belong  **consumer-0**: (paritonIndex:1)%(consumerSize:1)=0, 
   2. When secnd consumer is added(consumer-1) 
     - **P0** belong **consumer-0**: (paritonIndex:0)%(consumerSize:2)=0, 
     - **P1** belong  **consumer-1**: (paritonIndex:1)%(consumerSize:2)=1, 
   
   P1 active consumer transfers from consumer-0 to consumer-1.
   
   So, the modulo algorithm also causes active consumer transfer.
   
   ### So go back this comment
   > After PIP-392 changes, a similar problem would also appear for partitioned 
topics.
   I guess addressing https://github.com/apache/pulsar/issues/15189 could 
happen in a separate PIP, but I'd suggest documenting this detail also in the 
PIP-392 document that it's a known consequence of the change.
   
   I don't think it's necessary to add more explanation for this.
   
   1. #15189 does not have a clear conclusion proving that the consistent hash 
algorithm for non-partitioned topics has issues.
   2. Regarding potential duplicate messages and out-of-order behavior in 
failover, we have already explained this in the documentation: 
https://pulsar.apache.org/docs/4.0.x/concepts-messaging/#failover";
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to