bdoyle0182 opened a new issue, #5256: URL: https://github.com/apache/openwhisk/issues/5256
This discussion originated on slack. I'm moving here for more formal discussion from the community on the topic. Original post: @bdoyle0182: Just wanted to open a discussion about the new scheduler. It's very optimized towards short running requests i.e. few milliseconds. However if there are very long running functions i.e. 10+ seconds then it scales out very quickly since the container throughput calculation is just going to be that it needs a new container for each added level of concurrency, this obviously is a very miniscule amount of faas use cases but it's supported nonetheless. These use cases are much more async then the normal use case of sync responses expected in a reasonable http request time of a few milliseconds so some latency to wait for available space should be much more acceptable. For example if a function takes 10 seconds to run, the user of that function won't really care if it has to wait 2-3 seconds for available space, and both the namespace and operator would likely prefer latency over uncontrolled fan out of concurrency. The problem imo is that the activation staleness value is constant for all function types (currently 100ms). 100ms definitely makes sense for anything that runs within a second, but do we think that we could make this value dynamic to what the average duration is for that function? Or if I'm on the right track here on how we could potentially control fan out of long running functions and prefer latency over fan out? @style95: Yes, it's worth discussing. What you have said is correct. When designing the new scheduler, we prioritized latency over resources. It was based on the thought that public clouds like AWS would try to minimize latency no matter which type of functions are running and we also wanted to reduce the latency as much as we can. But it can lead to too many containers being provisioned at once. And it caused some trouble in our environment too when there are not many invoker nodes. This issue especially sticks out for long-running action as container provision takes generally more than 100ms. So even if more containers are being provisioned, messages become easily staled because all running containers are already handling activations that will take more than 100ms, and container provision also takes more than 100ms in turn activations in the queue generally wait for more than 100ms. One guard here is the scheduler does not provision containers more than the number of messages. So when there are 4 waiting messages, it only creates containers of up to 4. But if the concurrent limit is big(it's common for public clouds) and a huge number of messages are incoming, it will try to create a huge number of containers at once. We need more ways to do the fine-grained control of provisioning. @bdoyle0182: `This issue especially sticks out for long-running action as container provision takes generally more than 100ms` yes this is exactly what I'm finding. Container provision takes anywhere from 500ms to 2 seconds so when the wait time is 100ms the fan out of containers can be particularly bad because it checks every 100ms and provisions more each time and there won't be any activations complete for a couple seconds and creating a huge number of containers at once can slow down the docker daemon further making provisioning even slower (though with the new scheduler container provisioning is balanced across hosts unlike the old scheduler which is just one of many huge wins on keeping the docker daemon under control :slightly_smiling_face: ) @style95: Yes. So I naively thought we need to control the number of concurrent provisioning. If it does not impact the whole system, we can still provision many containers for actions. But if it tries to create too many containers and it is expected it would cause any issues to the whole system, we can throttle them. But I couldn't think of it deeply yet. @rabbah: How do things look for functions that run for minutes? Will check out discussion in GitHub. I'm curious if there should be multiple schedulers which can be tailored for the function modality. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
