bdoyle0182 opened a new pull request, #5284:
URL: https://github.com/apache/openwhisk/pull/5284

   
   ## Description
   This is a wip and trying to get feedback if this is an option we could 
support. This would add two new configs for the scheduler:
   
         ```allow-over-provision-before-throttle = true
         namespace-over-provision-before-throttle-ratio = 1.5``` (would it be 
preferred by people that this is a fixed value rather than ratio)
   
   If the first config is false, the ratio is not used anywhere. But the 
purpose of this is to prevent deadlocking on the new scheduler for actions 
within a namespace due to the new concept of namespace throttling. Since the 
scheduler can be aggressive with initially over-provisioning, you can get into 
a scenario where a single action uses up all of the container concurrency of 
the namespace before another action gets a change to run. What the code change 
below attempts to do is up the container concurrency limit to the over 
provision threshold to allow new actions to have a chance to run. This is 
really needed if one action depends on another. i.e. action a attempts to 
execute action b, but action a has all of the action concurrency for the 
namespace. The workload now will never make progress and action a will only 
ever fail all of its executions. Ultimately the right thing to do is scale out 
the namespace, but that's manual by a human and can't be done until impact has 
already occu
 rred and this helps mitigate by slowing throughput rather than total failures.
   
   Example of how it works:
   
   - Namespace A has a container concurrency limit of 30
   - Action A uses all 30 containers
   - With an over provision before throttle ratio of 1.5, the namespace can 
actually have a max total of 45 containers.
   - Action B comes in and since the namespace has hit its initial limit, the 
scheduler gives it max of 1 container until the number of containers for the 
namespace goes below it's normal limit of 30. Since it's not likely provisioned 
enough it will get action throttled which is still better than not letting any 
traffic at all.
   - The namespace can get a total of 15 additional actions with a max of 1 
container before it is now completely over-provisioned and gets namespace 
throttled so no additional containers can be created.
   
   I'll admit this is a little hacky, but I do think we need to account for 
this throttling case until we are able to support action level container 
concurrency limits. So any better ideas are welcome.
   
   ## Related issue and scope
   <!--- Please include a link to a related issue if there is one. -->
   - [ ] I opened an issue to propose and discuss this change (#????)
   
   ## My changes affect the following components
   <!--- Select below all system components are affected by your change. -->
   <!--- Enter an `x` in all applicable boxes. -->
   - [ ] API
   - [ ] Controller
   - [ ] Message Bus (e.g., Kafka)
   - [ ] Loadbalancer
   - [X] Scheduler
   - [ ] Invoker
   - [ ] Intrinsic actions (e.g., sequences, conductors)
   - [ ] Data stores (e.g., CouchDB)
   - [ ] Tests
   - [ ] Deployment
   - [ ] CLI
   - [ ] General tooling
   - [ ] Documentation
   
   ## Types of changes
   <!--- What types of changes does your code introduce? Use `x` in all the 
boxes that apply: -->
   - [ ] Bug fix (generally a non-breaking change which closes an issue).
   - [X] Enhancement or new feature (adds new functionality).
   - [ ] Breaking change (a bug fix or enhancement which changes existing 
behavior).
   
   ## Checklist:
   <!--- Please review the points below which help you make sure you've covered 
all aspects of the change you're making. -->
   
   - [X] I signed an [Apache 
CLA](https://github.com/apache/openwhisk/blob/master/CONTRIBUTING.md).
   - [X] I reviewed the [style 
guides](https://github.com/apache/openwhisk/blob/master/CONTRIBUTING.md#coding-standards)
 and followed the recommendations (Travis CI will check :).
   - [ ] I added tests to cover my changes.
   - [ ] My changes require further changes to the documentation.
   - [ ] I updated the documentation where necessary.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to