Hi - We’ve been discussing how to handle mesos framework HA in the Invoker, and I created a proposal on the wiki to discuss.
https://cwiki.apache.org/confluence/display/OPENWHISK/Clustered+Singleton+Invoker+for+HA+on+Mesos In general, the idea is to allow a single cluster-wide/single ContainerPool to operate, while providing a reasonable failover behavior in case of its unexpected death. To accomplish this, the proposal is to allow parts of the ContainerPool (freePool and prewarmPool) to be replicated to other (passive) invoker instances, and to allow the replicated container meta data to be used by ContainerFactories to resurrect containers for use in case a failure occurs. This does a couple things, like removing the notion of resource scheduling from the Controller (since there is only ever 1 invoker), and allows the ContainerPool to operate with a holistic view of the cluster, useful for whole-cluster ContainerFactory impls like MesosContainerFactory. I’m curious if the kubernetes folks will also find this useful? PR (WIP) is forthcoming. Thanks Tyson
