merlimat opened a new pull request #6791:
URL: https://github.com/apache/pulsar/pull/6791
### Motivation
The current implementation of KeyShared subscriptions uses a mechanism to
divide they hash space across the available consumers. This is based on
dividing the currently assigned hash ranges when a new consumer joins or
leaves.
There are few problems with the current approach:
1. When adding a new consumer, the bigger range is split to make space for
the new consumer. That means that when adding 3 consumers, 1 of them will "own"
a hash range that is double in size compared to the other 2 consumers and
consequently it will receive twice the traffic. This is not terrible, but not
ideal either.
2. When removing consumers, the range for the removed consumer will always
be assigned to the next consumer. The new hash distribution really depends on
the sequence upon which the consumers are removed. If one is unlucky, the
traffic will be very heavily skewed having situations where 1 consumer is
getting >90% of the traffic.
This is an example of removing consumers in sequence, with attached the size
of their respective hash ranges:
```
Removed consumer from rangeMap: {c1=8192, c10=4096, c3=4096, c4=8192,
c5=4096, c6=8192, c7=16384, c8=8192, c9=4096}
Removed consumer from rangeMap: {c1=8192, c10=4096, c4=8192, c5=4096,
c6=12288, c7=16384, c8=8192, c9=4096}
Removed consumer from rangeMap: {c1=8192, c10=4096, c5=4096, c6=12288,
c7=16384, c8=16384, c9=4096}
Removed consumer from rangeMap: {c1=8192, c10=8192, c6=12288, c7=16384,
c8=16384, c9=4096}
Removed consumer from rangeMap: {c1=24576, c10=8192, c6=12288, c7=16384,
c9=4096}
Removed consumer from rangeMap: {c1=24576, c10=8192, c7=28672, c9=4096}
Removed consumer from rangeMap: {c1=53248, c10=8192, c9=4096}
```
As you can see, `c1` will take most of the traffic.
Most likely it will not be able to process all the messages and the backlog
builds up.
### Modifications
* No functional difference from user perspective
* Use consistent hashing mechanism to assign keys to consumers. This will
ensure even distribution without the degradation in the corner cases.
* Number of points in the ring is configurable, default=100.
* Refactored current unit test. The test are currently duplicating the
logic of the implementation and checking the a messages is placed on the bucket
for one consumer. Of course it works, since it's the same code executed on both
sides. But, instead, the test should focus on the contract of the feature:
message should arrive in order, there should be "decent" sharing of load across
consumers.
* @codelipenghui I've removed the `selectByIndex()`. In my opinion there's
absolutely no difference in efficiency/performance as I've also explained on
https://github.com/apache/pulsar/pull/6647#issuecomment-617497271. I'm happy to
discuss more about it.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]