### Motivation: Pulsar function assignment has scalability and performance issue.
Right now, [SchedulerManager](https://github.com/apache/incubator-pulsar/blob/master/pulsar-functions/worker/src/main/java/org/apache/pulsar/functions/worker/SchedulerManager.java#L154) publishes all assignments into one pulsar message. Here, each assignment generates 700 bytes of payload. Pulsar has limitation with pulsar messages-size which is around 5MB. Now, if each function is running with 3 instances then it requires 2KB of payload so, function can only support around 2500 functions in the cluster. Also, assignment event is something that happens more frequent in the system which can be triggered on any assignment change or worker restart. So, over period of time, we can expect large number of assignment messages stored across many ledgers in the system and every time worker restart, it requires to read all those very old ledgers from BK which is something we would also definitely like to avoid. **Note:** We can easily reproduce it by registering function with parallelism=12000 which will fail to publish assignment message. ### Modification 1. Publish multiple messages (each message with limited number of assignments) to include all assignments to support any number of function assignments 2. Acks the message for old version of assignments (which requires separate namespace for assignment which won't have infinite retention configured) 3. Broker deletes ledgers for old assignments and assignment-reader doesn't have to read such ledgers. ### Result 1. Pulsar function can support any number of functions in the system 2. Assignment-manager doesn't read old assignments so, broker and bookie can avoid unnecessary read and dispatching [ Full content available at: https://github.com/apache/incubator-pulsar/pull/2438 ] This message was relayed via gitbox.apache.org for [email protected]
