This is an automated email from the ASF dual-hosted git repository.

rabbah pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/openwhisk.git


The following commit(s) were added to refs/heads/master by this push:
     new ef33823  Add a proposal for a new scheduler (#4921)
ef33823 is described below

commit ef33823a1d22179133999f7cd628202cd0498a5a
Author: Dominic Kim <[email protected]>
AuthorDate: Thu Jul 23 00:18:12 2020 +0900

    Add a proposal for a new scheduler (#4921)
---
 .../POEM-1-proposal-for-openwhisk-enhancements.md  |   2 +-
 .../POEM-2-function-pulling-container-scheduler.md | 126 +++++++++++++++++++++
 2 files changed, 127 insertions(+), 1 deletion(-)

diff --git a/proposals/POEM-1-proposal-for-openwhisk-enhancements.md 
b/proposals/POEM-1-proposal-for-openwhisk-enhancements.md
index cfed8c3..9bbb0d5 100644
--- a/proposals/POEM-1-proposal-for-openwhisk-enhancements.md
+++ b/proposals/POEM-1-proposal-for-openwhisk-enhancements.md
@@ -21,7 +21,7 @@
 Process for introducing an OpenWhisk Enhancement (POEM)
 
 ## Status
-* Current state: Draft
+* Current state: Completed
 * Author: @style95
 
 ## Summary
diff --git a/proposals/POEM-2-function-pulling-container-scheduler.md 
b/proposals/POEM-2-function-pulling-container-scheduler.md
new file mode 100644
index 0000000..d5c7754
--- /dev/null
+++ b/proposals/POEM-2-function-pulling-container-scheduler.md
@@ -0,0 +1,126 @@
+<!--
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+-->
+
+# Title
+Function Pulling Container Scheduler (FPCScheduler)
+
+## Status
+* Current state: In-progress
+* Author(s): @style95, @ningyougang, @keonhee, @jiangpengcheng, @upgle
+
+## Summary and Motivation
+
+This POEM proposes a new scheduler for OpenWhisk, FPCScheduler.
+Previously we revealed [performance 
issues](https://cwiki.apache.org/confluence/display/OPENWHISK/Autonomous+Container+Scheduling+v1)
 in OpenWhisk.
+There are many reasons for them, but we can summarize them into the following 
problems.
+
+1. Multiple actions share the same `homeInvoker`.
+2. Current scheduler does not consider the time taken for container operations 
(create/delete, pause/unpause).
+3. Resources are evenly divided by the number of controllers.
+
+First, multiple actions share the same `homeInvoker`. This is because the 
`homeInvoker` is statically decided by a hash function.
+While scheduling, resource status in the invoker side is not considered and 
this introduces busy hotspot invokers while the others are idle.
+This is generally a good idea to increase the probability of container reuse, 
but it leads to performance degradation because of slow container operation.
+When all resources in an invoker are taken by one action (busy/warm), an 
activation for another action will remove one of the existing containers and 
create a new one for it.
+If actions run for a long time this would be a good approach in terms of 
bin-packing. But when actions run for short time (most of the serverless cases),
+the heuristic severely increases the response time because container 
operations take one to two orders of magnitude longer time than invocation and 
this results in poor performance.
+
+Second, since controllers schedule activations to invokers in a distributed 
environment, the invokers in the system are partitioned and assigned to 
specific controllers. This means that any one controller is assigned a fraction 
of the available resources in the system.
+It's not feasible to collect the status of all invokers so that scheduling 
decisions have a global view.
+As a result, controllers may throttle activations even if cluster-wide 
invocations are below limits (and when there are available resources and system 
capacity).
+
+We propose FPCScheduler to address the above issues.
+The scheduler differs from the existing scheduler(`ShardingPoolBalancer`) in 
the following ways:
+- It schedules containers rather than function requests. Each container will 
pull activations requests from the given action queue continuously.
+- Whenever one execution is over, it fetches the next activation requests and 
invokes it repeatedly. In this way, we can maximize container reuse.
+- An added benefit is that schedulers don't need to track or consider the 
location of existing containers. Schedulers decide where and when to create 
more containers.
+- The scheduler can create more containers on invokers with enough resources. 
This enables distributed scheduling with all invoker resources among multiple 
schedulers.
+
+Controllers no longer schedule activation requests. Instead, controllers 
create action queues and forward activation requests to these queues.
+Each action queue is a dedicated queue for the given action and dynamically 
created/deleted by the FPCSchedulers. Each action has its own queue, so there 
is no interference among actions because of a shared queue.
+
+[ETCD](https://github.com/etcd-io/etcd), a distributed and reliable key-value 
store is used for a transaction, health-check, cluster information sharing, etc.
+Each scheduler performs a transaction via ETCD when scheduling. The health 
status of each component(scheduler, invoker) is managed by a 
[lease](https://help.compose.com/docs/etcd-using-etcd3-features#leases) in ETCD.
+Whenever a component is failed, it no longer sends keepalive requests, and its 
health data is removed via a lease timing out.
+Cluster-wide information such as scheduler endpoints, queue endpoints, 
containers in a namespace, and throttling is stored in ETCD and referenced by 
corresponding components.
+Controllers throttle namespaces based on the throttling data in ETCD. So all 
controllers share the same view against the resources and manage them in the 
same way.
+
+One more benefit of having our own component for routing rather than utilizing 
open-source components such as Kafka is we can extend and implement any routing 
logic.
+We would want various routing policies at some point. For example, when we 
utilize multiple versions of an action in-flight in the future we might want to 
control the traffic ratio between the two versions,
+we can route activations only to invokers with a specific resource, we may 
want to have dedicated invokers for some namespaces and so on. And this POEM 
could be a baseline for such extensions.
+
+## Proposed changes: Architecture Diagram (optional) and Design
+The design document along architecture diagram is already shared on the 
[OpenWhisk 
Wiki](https://cwiki.apache.org/confluence/display/OPENWHISK/Apache+OpenWhisk+Project+Wiki?src=sidebar)
+
+* [Design 
Consideration](https://cwiki.apache.org/confluence/display/OPENWHISK/Design+consideration?src=contextnavpagetreemode)
+* 
[Architecture](https://cwiki.apache.org/confluence/display/OPENWHISK/System+Architecture)
+* [Component 
Design](https://cwiki.apache.org/confluence/display/OPENWHISK/Component+Design)
+
+### Implementation details
+
+For the record, we (NAVER) are already operating OpenWhisk with this scheduler 
in a production environment.
+We want to contribute the scheduler and hope it evolves with the community.
+
+There are many [new 
components](https://cwiki.apache.org/confluence/display/OPENWHISK/Component+Design).
+
+#### Common components
+We store data in ETCD, there are many relevant components such as 
`EtcdClient`, `LeaseKeepAliveService`, `WatcherService`, 
`DataManagementService`, etc.
+
+* `EtcdClient`: An ETCD client offering basic CRUD operations.
+* `DataManagementService`: In charge of storing/recovering data in ETCD. It 
internally utilizes `LeaseKeepAliveService`, and `WatcherService`.
+* `LeaseKeepAliveService`: In charge of keeping the given lease alive.
+* `WatcherService`: Watches the given keys and receives events for the keys.
+
+#### Scheduler components
+
+In an abstract view, schedulers provide queueing, container scheduling, and 
activation routing.
+
+* `QueueManager`: The main entry point for queue creation request. It has 
references to all queues.
+* `MemoryQueue`: Dynamically created/deleted for each action. It watches the 
incoming/outgoing requests and triggers container creation.
+* `ContainerManager`: Schedule container creation requests to appropriate 
invokers.
+* `ActivationServiceImpl`: Provide API for containers to fetch activations via 
Akka-grpc. It works in a long-poll way to avoid busy-waiting.
+
+#### Controller components
+* `FPCPoolBalancer`: Create queues if not exist, and forward messages to them.
+* `FPCEntitlementProvider`: Throttle activations based on throttling 
information in ETCD.
+
+#### Invoker components
+
+* `FunctionPullingContainerPool`: A container pool for function pulling 
container. It handles the container creation requests.
+* `FunctionPullingContainerProxy`: A proxy for a container. It repeatedly 
fetches activations and invokes them.
+* `ActivationClientProxy`: The Akka-grpc client. It communicates with 
`ActivationServiceImpl` in schedulers.
+* `InvokerHealthManager`: Manages the health and resources information of 
invokers. The data is stored in ETCD. If an invoker becomes unhealthy, it 
invokes health activations.
+
+## Future work
+
+#### Persistence
+
+We reached a conclusion that the persistence of activation requests is not 
that mandatory requirement along with the at-most-once nature and 
circuit-breaking of OpenWhisk.
+But if it is desired, we can implement a persistent queue rather than an 
in-memory queue.
+
+#### Multiple partitions
+
+Currently, a queue only has one partition. Since there can be multiple queues 
for each action, cluster-wide RPS(request per second) increases linearly.
+If high RPS for one action is required, we can implement partitioning in 
queues to introduce parallel processing like what Kafka does.
+But we already confirmed 100K RPS with 10 queues and 10 commodity invokers. 
And the performance would linearly increase with more invokers and queues.
+
+## Integration and Migration plan
+
+Since they are all new components and can coexist with existing components 
using SPI(Service Provider Interface), there would be no breaking change.
+We would incrementally merge PRs into the master branch.

Reply via email to