New Scheduler Architecture Users: Breaking Serialization Change Incoming

Brendan Doyle Mon, 01 May 2023 20:32:25 -0700

Hi all,

If you are not using the new fpc scheduler in a cluster that cannot handle
zero downtime, you can stop reading here.


I have two PR’s linked below that I will be merging in later this week.
This is to allow getting the max instances per action within a namespace
into master and to allow future changes to the CreateQueue message without
another breaking change like this. With that said, it does not prevent an
in place rolling cluster upgrade but it does require taking a certain
order. So you have two options to continue on the latest master commit
before we do an official release of the fpc scheduler:

1. Perform a blue / green deployment if capable which the new scheduler
makes simple with the etcd cluster prefix config.
2.
- To perform an in-place cluster change you should first update half of the
schedulers. At this point the controller will not be able to send a
CreateQueue message for actions without a queue running onto a scheduler
that has been upgraded. However the controller should gracefully handle
this and try to create queue on all schedules before failing the
activation.
- Once half of the schedulers have been upgraded, upgrade all of the
controllers. Any action that already has a queue on a scheduler will work
fine, any new action at this point will now be able to be created on any
scheduler that has been upgraded but not the yet to be upgraded.
- Finally upgrade the remaining schedulers.

So to succinctly summarize the rolling upgrade order: upgrade half the
schedulers, then all of the controllers, then finish the remaining
schedulers.

If you object to these changes, please respond by Thursday of this week
which is when I plan to merge.

https://github.com/apache/openwhisk/pull/5389

https://github.com/apache/openwhisk/pull/5287

Thanks,
Brendan

New Scheduler Architecture Users: Breaking Serialization Change Incoming

Reply via email to