Re: The ability to drain Pulsar Function workers

Ivan Kelly Wed, 01 Sep 2021 00:25:05 -0700

> When the leader receives a request to drain a worker, it must first mark
> the worker as in the process to be drained i.e. blacklist the worker so
> that no new assignments can be assigned to it. We can perhaps just save the
> blacklist in memory. The worker should then create a new scheduling in
> which the assignments of the worker to be drained are moved to other
> workers perhaps in a round robin distribution.  Afterwards, the leader
> should mark the drain of the worker to be complete.
>
> There are some caveats to this approach.  If the leader fails before
> completing the drain request.  The drain request will not be fulfilled.
> However, if the client frequently checks the status of the drain, it should
> notice that the drain is not running and can re-submit a request.


A couple of questions/comments.

When a leader fails, does the new leader automatically create a new
assignment, or does it continue with the assignment from the previous
leader?

Is the drain request a new concept in the model? I would suggest it
would be better for the drain command to mark the worker as
unschedulable (persisted). Then the check for whether draining is
complete is whether the worker is doing any work (i.e. whether it has
seen and processed the schedule that it's no longer a part of). This
way there's no "drain" request to track as such. There's marking the
worker as unschedulable, which is idempotent.

The leader should work in a declarative rather than imperative
fashion. i.e. it should generate the desired schedule, and the workers
should work to match this schedule. This should avoid the leader
failing issue.

-Ivan

Re: The ability to drain Pulsar Function workers

Reply via email to