A few quick thoughts: You might find these notes interesting: https://github.com/python-trio/trio/issues/32
It sounds like a more precise description of your scheduler would be "hierarchical FIFO"? i.e., there's a top-level scheduler that selects between "child" schedulers in round-robin/FIFO fashion, and each child scheduler is round-robin/FIFO within its schedulable entities? [1] Generally I would think of a "fair" scheduler as one that notices when e.g. one task is blocking the event loop for a long time when it runs, and penalizing it by not letting it run as often. For your motivating example of a health-check: it sounds like another way to express your goal would be by attaching a static "priority level" to the health-check task(s), such that they get to run first whenever they're ready. Have you considered that as an alternative approach? But also... isn't part of the point of a healthcheck that it *should* get slow if the system is overloaded? -n [1] http://intronetworks.cs.luc.edu/current/html/queuing.html#hierarchical-queuing On Thu, Jun 15, 2017 at 2:40 PM, Pau Freixes <pfrei...@gmail.com> wrote: > Hi guys, recently I've been trying to implement a POC of the default > event loop implemented by asyncio but using a fair scheduling reactor. > At the moment is just > a POC [1], something to test the rationale and pending to be evolved > in someone more mature, but before of that I would prefer to share my > thoughts and get all of the comments from you. > > The current implementation is based on a FIFO queue that is filled > with all of the callbacks that have to be executed, these callbacks > can stand for: > > 1) Run tasks, either to be started or resumed > 2) Run future callbacks. > 3) Run scheduled callbacks. > 4) Run file descriptors callbacks. > > Worth mentioning that most of them internally are chained in somehow, > perhaps a future callback can wake up a resumed task. Also, have in > mind that the API published by asyncio to schedule callbacks can be > used by asyncio itself or by the user, callsoon for example. > > The usual flow the reactor is the following one: > > 1) Check the file descriptors with events, and stack the handlers into > the reactor. > 2) Pop all outdated scheduled callbacks and push them into the reactor. > 3) Iterate for N first elements at the queue, where N stands for the number of > the handles stacked at that moment. Future handles stacked during that > iteration won't be handled, they must wait until next whole iteration > 4) Go to the point 1. > > As you can observe here, the IO is only made once per loop and should > wait until all handles that are in a specific moment are executed. > > This implements in somehow a natural backpressure, the read and also > the accept the new connections will rely on the buffers run by the > operating system. > > That implementation can be seen as simple, but it stands on a solid > strategy and follows KISS design that helps to scare the bugs. > > Why fair scheduling? > > Not all code that is written in the same module, in terms of loop > sharing, has the same requirements. Some part might need N and other > parts M. When this implementation cant be decoupled, and it means that > the cost of placing them into a separated pieces inside of your > architecture are too expensive, in that scenario the developer cant > express this difference to make the underlying implementation aware of > that. > > For example, an API with a regular endpoint accessed by the user and > another one with the health-check of the system, which has completely > different requirements in terms of IO. Not only due to the nature of > the resources accessed, also because of the frequency of use. > Meanwhile, the healthcheck is accessed to a known a frequency at X > seconds, the other endpoint has a variable frequency of use. > > Do you believe that asyncio will be able to preserve the health-check > frequency at any moment? Absolutely not. > > Therefore, the idea of implementing a fair scheduling reactor is based > on the needed of address these kind of situations, giving to the > developer an interface to isolate different resources. > > Basic principles > > The basic principles of the implementation are: > > - The cost of the scheduling has to be the same of the current > implementation, no overhead > - The design has to follow the current one, having the implicit > backpressure that was commented. > > I will focus in the second principle, taking into account that the > first one is a matter of implementation. > > To achieve the same behavior, the new implementation only split the > resources - handles, schedules, file descriptors - in isolated > partitions to then implement for each partition the same algorithm > than the current one. The developer can create a new partition using a > new function called `spawn`, this function takes as an argument a > coroutine, the task wrapped to that coroutine and all of the resources > created inside this coroutine will belong to that partition. For > example: > >>>> async def background_task(): >>>> task = [ fetch() for i in range(1000)] >>>> return (await asyncio.gather(*t)) >>>> >>>> async def foo(): >>>> return (await asyncio.spawn(background_task())) > > > All resources created inside the scope of the `background_tasks` are > isolated to one partition. The 1000 sockets will schedule callbacks > that will be stacked in the same queue. > > The partition is by default identified with the hash of the task that > warps the `background_task`, but the user can pass an alternative > value. > >>>> async def foo(): >>>> return (await asyncio.spawn(healtheck(), partition='healthcheck')) > > Internally the implementation has a default ROOT partition that is > used for all of these resources that are not executed inside of the > scope of a spawn function. As you can guess, if you don't use the > spawn method the reactor will run exactly as the current > implementation. Having all the resources in the same queue. > > > Round robin between partitions. > > The differents partitions that exist at some moment share the CPU > resource using a round robin strategy. It gives the same chance to all > partitions to run the same amount of handles, but with one > particularity. Each time that a partition runs out of handles, the > loop is restarted again to handle the file descriptors and the delayed > calls but only for that specific partition that runs out of handles. > > The side effect is clear, have the same backpressure mechanism. But, > per partition. > > > The black hole of the current implementation. > > There is always a but, at least I've found a situation where this > strategy can perform in the same way as the current one, without > applying any fair scheduling. Although the code uses the spawn method. > > Have a look at the following snippet: > >>>> async def healtheck(request): >>>> await request.resp() >>>> >>>> async def view(request): >>>> return (await asyncio.spawn(healthcheck(request))) > > > The task that wraps the healtcheck coroutine that is being isolated in > a partition, won't be scheduled until the data from the file > descriptor that is read by a callback that is in fact executed inside > of the ROOT partition. Therefore, in the worst case scenario, the fair > scheduling will become a simple FIFO scheduling. > > IMHO there is not an easy way to solve that issue, or at least without > changing the current picture. And try to solve it, might end up having > a messy implementation and a buggy code. > > Although, I believe that this still worth it, having in mind the > benefits that it will bring us for all of those cases where the user > needs to isolate resources. > > Thoughts, comments, and others will be welcomed. > > [1] https://github.com/pfreixes/qloop > > -- > --pau > _______________________________________________ > Async-sig mailing list > Async-sig@python.org > https://mail.python.org/mailman/listinfo/async-sig > Code of Conduct: https://www.python.org/psf/codeofconduct/ -- Nathaniel J. Smith -- https://vorpus.org _______________________________________________ Async-sig mailing list Async-sig@python.org https://mail.python.org/mailman/listinfo/async-sig Code of Conduct: https://www.python.org/psf/codeofconduct/