bdoyle0182 commented on code in PR #5288:
URL: https://github.com/apache/openwhisk/pull/5288#discussion_r927296786


##########
proposals/POEM-4-action-concurrency-limit-within-namespace.md:
##########
@@ -0,0 +1,82 @@
+<!--
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+-->
+
+# Title
+User Defined Action Level Concurrency Limits Within Confines of Global 
Namespace Limit
+
+## Status
+* Current state: In-progress
+* Author(s): @bdoyle0182 (Github ID)
+
+## Summary and Motivation
+
+Currently, openwhisk has a single concurrency limit for managing auto scaling 
within a namespace. This limit for each namespace is managed
+rightly by system administrators to maintain a good balance between the 
namespaces of the system and the total system's resources.
+
+However, this does not allow for the user to control how their applications 
scale within the namespace that they are operating. There is no
+fairness across functions within a namespace. The semantics of a namespace can 
vary heavily depending on how openwhisk is being used. A namespace
+could represent an organization for public cloud, a group within an 
organization, an application of functions, a logical grouping of applications
+(for example putting all of your interactions with slack in one namespace).
+
+The problem is that a single function can runaway and end up using all of the 
namespace's resources. It shouldn't be on the system administrators
+to provide this fairness as it's dependent on the application and what the 
user wants. They may want the existing behavior to allow any action
+to scale up to the total namespace's resources, they may want to restrict one 
less prioritized function scale up to a smaller threshold so it can't eat
+the entire namespace's resources but still allow other high priority functions 
access to the entire namespace's resources, or they may want to provide
+limits to all of their actions that add up to their namespace limit which will 
guarantee each action in their namespace can have up to their defined
+action concurrency limits similar to other FaaS providers concept of reserved 
concurrency for actions.
+
+With the major revision to how Openwhisk processes activations with the new 
scheduler, such a feature becomes extremely easy to implement by just adding
+a single new limit that users can configure on their action document.
+
+## Proposed changes: Architecture Diagram (optional), and Design
+
+Add a optional `maxContainerConcurrency` limit field to action documents in 
the limits section. This limit will be used in the scheduler when deciding
+if there is capacity for the action to scale up more containers. Previously, 
the scheduler was completely naive of functions across a namespace when 
provisioning
+more containers, but if this limit is defined the scheduler will only allow to 
provision containers up to the defined action limit (which must be less than or 
equal to the namespace limit).
+
+### Implementation details
+
+A working PR of this POEM is already done in which implementation details can 
be reviewed but I will describe implementation considerations here. Once the 
POEM is approved,
+I will add any feedback from the POEM, tests, and documentation.
+
+- The scheduler decision maker uses the min of action container concurrency 
limit and the namespace concurrency limit. If the action limit is less than the 
namespace
+limit, it will check both that the action hasn't used up all its capacity and 
that the namespace still has capacity if the action does still have capacity.
+- The new limit `maxContainerConcurrency` on the action document is an 
optional field. If the field does not exist, the action limit used by the 
system is
+the namespace limit making this an optional feature.
+- The one thing not yet included in the implementation param is a parameter on 
the create action api which will allow the user to delete the limit field so 
that
+the action will rely on the namespace limit again.
+- When creating an action, the api will validate that your action container 
concurrency limit is less than or equal to the namespace concurrency limit. If 
it is greater,
+the upload will fail with a BadRequest and error message that the limit must 
be less than the namespace limit with the namespace limit value included in the 
message.
+- If the system admin lowers a namespace's concurrency limit below an amount 
that an existing action document has already configured, it will not break the 
action.

Review Comment:
   Yes it's the user's responsibility. The idea is that users can configure 
high priority functions to have a high limit such that it can still potentially 
get more than another low priority function and that lower priority action may 
not get up to its lower limit. As an example of a namespace with a limit of 20 
and 3 actions A, B, and C:
   
   - A configures limit of 5
   - B configures limit of 5
   - C configures limit of 15
   
   C gets a burst of traffic that uses all 15 of its limit, there is now only 5 
remaining for both A and B. That's fine as a user might want this level of 
scaling at C at the expense of the max A and B can now scale to being 2 or 3 
each instead of 5.
   
   Or the user can configure their limits perfectly up to their namespace limit 
to guarantee fairness such that A gets 5, B gets 5, and C gets 10. I think this 
level of flexibility to the user is a good thing.
   
   I didn't want to refernce aws lambda explicitly, but in comparison to their 
concept of reserved concurrency for individual functions I think this provides 
more flexibility to the user. Reserved concurrency on lambda takes away from 
the total pool when configuring on a function so if I configure 5 to an action 
and the account limit is 20, there's now 15 capacity for other functions at all 
times. Well what if that function is not doing something most of the time? 
You've now taken away capacity permanently from your pool for a function that 
barely runs in exchange for a guarantee it will always be able to scale up to 
5. In this proposal, you still have the ability to give yourself that guarantee 
if you provision evenly across your namespace as the user; but also have the 
flexibility to be smarter with what your traffic patterns look like in giving 
yourself additional overprovisioned / high priority capacity across your 
functions (which is sort of what serverless is all about.)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to