travigd opened a new pull request #4978: URL: https://github.com/apache/openwhisk/pull/4978
<!--- Provide a concise summary of your changes in the Title --> ## Description <!--- Provide a detailed description of your changes. --> <!--- Include details of what problem you are solving and how your changes are tested. --> The motivation for this change is essentially that some activations trigger the creation of new containers, which has the potentially to be a very long operation depending (e.g., if the creation of the container triggered a scale-up in cluster size), and OpenWhisk will block (with respect to that activation) until the container is created. This is not ideal since oftentimes, other containers become free and ready to service the request in the meantime. This change: * Immediately moves cold start containers into the busy queue and waits for them to send `NeedWork` to the `ContainerPool` (this means that the activation will run on the next available container that can service the request) * Adds a `PreRun` event that is sent to the container to tell it how to initialize ### Current Issues This doesn't currently pass all the tests, but I suspect I'm at the limit of what I can do here (as I don't have lots of other context around the codebase and the project overall). * There feels like a weird parallel-but-not-quite pathway between prewarm and cold-start containers now (since now, both have an initialization step, but the prewarm has the stem-cell-differentiation step as well). It would be nice to simplify that if possible. > Quick note on my terminology, since I think I understand how these words are used, but want to make sure: > * action = "specification for how to handle a request" (eg, using blackbox image or nodejs runtime) > * activation = "specific request that is handled by an action" The run buffer only considers the current head of the run queue (for activations that couldn't be immediately sent to containers), which is not ideal. Imagine this scenario: * Run A (using image 1) triggers a cold start, A is enqueued on the run buffer * Run B (using image 2) triggers a cold start, B is enqueued on the run buffer (position 1) * Container 2 finishes initialization (maybe it didn't require pulling an image while image 1 did have to be pulled) * The ContainerPool gets `NeedWork` from container 2, moves container 2 into the free pool, triggers `processBufferOrFeed()` * `processBufferOrFeed` re-sends the first run on the buffer, which is run A * There are no containers available to service run A (since it's still initializing) * Run A is re-enqueued and nothing else happens (even though run B could have been serviced) * Container 1 finishes initializing, sends `NeedWork` * Run A is de-queued again, and now handled * Run B is never actually handled??? One solution to this might be to add another layer between ContainerPool and ContainerProxy which would be something like `ActionPool`. The `ActionPool` would handle the creation of new containers and serving requests in the order in which they're received, but that gets messy when you need to restrict the total size of the ContainerPool (and it's also messy when there are resource contention issues - what happens when two different actions want to scale up the number of containers but the container pool is at the max size? right now, it would attempt to reap some old containers to service the next activation, so it's fair with respect to activation order). A (potentially simpler) solution might just be to have per-action buffers, so when you get a `NeedWork` corresponding to action 1, dequeue from that specific buffer. ## Related issue and scope <!--- Please include a link to a related issue if there is one. --> #4974 ## My changes affect the following components <!--- Select below all system components are affected by your change. --> <!--- Enter an `x` in all applicable boxes. --> - [ ] API - [ ] Controller - [ ] Message Bus (e.g., Kafka) - [ ] Loadbalancer - [ ] Invoker - [ ] Intrinsic actions (e.g., sequences, conductors) - [ ] Data stores (e.g., CouchDB) - [ ] Tests - [ ] Deployment - [ ] CLI - [ ] General tooling - [ ] Documentation ## Types of changes <!--- What types of changes does your code introduce? Use `x` in all the boxes that apply: --> - [ ] Bug fix (generally a non-breaking change which closes an issue). - [ ] Enhancement or new feature (adds new functionality). - [ ] Breaking change (a bug fix or enhancement which changes existing behavior). ## Checklist: <!--- Please review the points below which help you make sure you've covered all aspects of the change you're making. --> - [ ] I signed an [Apache CLA](https://github.com/apache/openwhisk/blob/master/CONTRIBUTING.md). - [ ] I reviewed the [style guides](https://github.com/apache/openwhisk/wiki/Contributing:-Git-guidelines#code-readiness) and followed the recommendations (Travis CI will check :). - [ ] I added tests to cover my changes. - [ ] My changes require further changes to the documentation. - [ ] I updated the documentation where necessary. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
