asafm commented on code in PR #18878:
URL: https://github.com/apache/pulsar/pull/18878#discussion_r1047160400


##########
site2/docs/concepts-messaging.md:
##########
@@ -599,10 +599,157 @@ In the diagram below, **Consumer A**, **Consumer B** and 
**Consumer C** are all
 
 #### Key_Shared
 
-In the *Key_Shared* type, multiple consumers can attach to the same 
subscription. Messages are delivered in distribution across consumers and 
messages with the same key or same ordering key are delivered to only one 
consumer. No matter how many times the message is re-delivered, it is delivered 
to the same consumer. When a consumer connects or disconnects, it causes the 
served consumer to change some message keys.
+In the *Key_Shared* type, multiple consumers can attach to the same 
subscription. Messages are delivered in distribution across consumers and 
messages with the same key or same ordering key are delivered to only one 
consumer. No matter how many times the message is re-delivered, it is delivered 
to the same consumer. 
 
 ![Key_Shared subscriptions](/assets/pulsar-key-shared-subscriptions.svg)
 
+There are three types of mapping algorithms dictating how to select a consumer 
for a given message key (or ordering key): Sticky, Auto-split Hash Range, and 
Auto-split Consistent Hashing. The steps for all algorithms are:
+1. The message key (or ordering key) is passed to a hash function (e.g., 
Murmur3 32-bit), yielding a 32-bit integer hash. 
+2. That hash number is fed to the algorithm to select a consumer from the 
existing connected consumers.
+
+```
+                      +--------------+                              
+-----------+
+Message Key ----->  / Hash Function / ----- hash (32-bit) -------> / Algorithm 
/ ----> Consumer   
+                   +---------------+                               +----------+
+```
+
+
+When a new consumer is connected and thus added to the list of connected 
consumers, the algorithm re-adjusts the mapping such that some keys currently 
mapped to existing consumers will be mapped to the newly added consumer. When a 
consumer is disconnected, thus removed from the list of connected consumers, 
keys mapped to it will be mapped to other consumers. The sections below will 
explain how a consumer is selected given the message hash and how the mapping 
is adjusted given a new consumer is connected or an existing consumer 
disconnects for each algorithm.
+
+##### Auto-split Hash Range
+
+The algorithm assumes there is a range of numbers between 0 to 2^16 (65,536). 
Each consumer is mapped into a single region in this range, so all mapped 
regions cover the entire range, and no regions overlap. A consumer is selected 
for a given key by running a modulo operation on the message hash by the range 
size (65,536). The number received ( 0 <= i < 65,536) is contained within a 
single region. The consumer mapped to that region is the one selected.
+
+Example:
+
+Suppose we have 4 consumers (C1, C2, C3 and C4), then:
+
+```
+ 0               16,384            32,768           49,152             65,536
+ |------- C3 ------|------- C2 ------|------- C1 ------|------- C4 ------|
+```
+
+Given a message key `Order-3459134`, its hash would be 
`murmur32("Order-3459134") = 3112179635`, and its index in the range would be 
`3112179635 mod 65536 = 6067`. That index is contained within region `[0, 
16384)` thus consumer C1 will be mapped to this message key.
+
+When a new consumer is connected, the largest region is chosen and is then 
split in half - the lower half will be mapped to the newly added consumer and 
upper half will be mapped to the consumer owning that region. Here is how it 
looks like from 1 to 4 consumers:
+
+```
+C1 connected:
+|---------------------------------- C1 ---------------------------------|
+
+C2 connected:
+|--------------- C2 ----------------|---------------- C1 ---------------|
+
+C3 connected:
+|------- C3 ------|------- C2 ------|---------------- C1 ---------------|
+
+C4 connected:
+|------- C3 ------|------- C2 ------|------- C1 ------|------- C4 ------|
+```
+
+When a consumer is disconnected its region will be merged into the region on 
its right. Examples:
+
+C4 is disconnected:
+
+```
+|------- C3 ------|------- C2 ------|---------------- C1 ---------------|
+```
+
+C1 is disconnected:
+
+```
+|------- C3 ------|-------------------------- C2 -----------------------|
+```
+
+The advantages of this algorithm is that it affects only a single existing 
consumer upon add/delete consumer, at the expense of regions not evenly sized. 
Thi means some consumers gets more keys that others. The next algorithm does 
the other way around. 
+
+##### Auto-split Consistent Hashing
+
+This algorithm uses a Hash Ring. It's a range of number from 0 to MAX_INT 
(32-bit) in which if you traverse the range, when reaching MAX_INT, the next 
number would be zero. It is as if you took a line starting from 0 ending at 
MAX_INT and bent into a circle such that the end glues to the start:
+
+```
+ MAX_INT -----++--------- 0
+              ||
+         , - ~ ~ ~ - ,
+     , '               ' ,
+   ,                       ,
+  ,                         ,
+ ,                           ,
+ ,                           ,
+ ,                           ,
+  ,                         ,
+   ,                       ,
+     ,                  , '
+       ' - , _ _ _ ,  '
+```
+
+When adding a consumer, we mark 100 points on that circle and associate them 
to the newly added consumer. For each number between 1 and 100, we concatenate 
the consumer name to that number and run the hash function on it to get the 
location of the point on the circle that will be marked. For Example, if the 
consumer name is "orders-aggregator-pod-2345-consumer" then we would mark 100 
points on that circle:
+```
+    murmur32("orders-aggregator-pod-2345-consumer1") = 1003084738
+    murmur32("orders-aggregator-pod-2345-consumer2") = 373317202
+    ...
+    murmur32("orders-aggregator-pod-2345-consumer100") = 320276078
+```
+
+Since the hash function has the uniform distribution attribute, those points 
would be uniformly distributed across the circle.
+
+```
+    C1-100                 
+         , - ~ ~ ~ - ,   C1-1
+     , '               ' ,
+   ,                       ,
+  ,                         , C1-2
+ ,                           ,
+ ,                           ,
+ ,                           ,
+  ,                         ,  C1-3
+   ,                       ,
+     ,                  , '
+       ' - , _ _ _ ,  '      ...
+ 
+```
+
+A consumer is selected for a given message key by putting its hash on the 
circle then continue clock-wise on the circle until you reach a marking point. 
The point might have more than one consumer on it (hash function might have 
collisions) there for, we run the following operation to get a position within 
the list of consumers for that point, then we take the consumer in that 
position: `hash % consumer_list_size = index`.
+
+When a consumer is added, we add 100 marking points to the circle as explained 
before. Due to the uniform distribution of the hash function, those 100 points 
act as if the new consumer takes a small slice of keys out of each existing 
consumer. It maintains the even distribution, on the trade-off that it impacts 
all existing consumers. [This 
video](https://www.youtube.com/watch?v=zaRkONvyGr8) explains the concept of 
Consistent Hashing quite well (the only difference is that in Pulsar's case we 
used K points instead of K hash functions as noted in the comments)
+
+##### Sticky

Review Comment:
   <img width="1046" alt="image" 
src="https://user-images.githubusercontent.com/989425/207335189-6d5fc449-26a0-4985-a4c3-221beab3037d.png";>
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to