### Motivation: 
Pulsar function assignment has scalability and performance issue.

Right now, 
[SchedulerManager](https://github.com/apache/incubator-pulsar/blob/master/pulsar-functions/worker/src/main/java/org/apache/pulsar/functions/worker/SchedulerManager.java#L154)
 publishes all assignments into one pulsar message. Here, each assignment 
generates 700 bytes of payload. Pulsar has limitation with pulsar messages-size 
which is around 5MB. Now, if each function is running with 3 instances then it 
requires 2KB of payload so, function can only support around 2500 functions in 
the cluster.

Also, assignment event is something that happens more frequent in the system 
which can be triggered on any assignment change or worker restart. So, over 
period of time, we can expect large number of assignment messages stored across 
many ledgers in the system and every time worker restart, it requires to read 
all those very old ledgers from BK which is something we would also definitely 
like to avoid.

**Note:** 
We can easily reproduce it by registering function with parallelism=12000 which 
will fail to publish assignment message. 

### Modification

1. Publish multiple messages (each message with limited number of assignments) 
to include all assignments to support any number of function assignments
2. Acks the message  for old version of assignments (which requires separate 
namespace for assignment which won't have infinite retention configured)
3. Broker deletes ledgers for old assignments and assignment-reader doesn't 
have to read such ledgers.

### Result
1. Pulsar function can support any number of functions in the system
2. Assignment-manager doesn't read old assignments so, broker and bookie can 
avoid unnecessary read and dispatching


[ Full content available at: 
https://github.com/apache/incubator-pulsar/pull/2438 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to