dazza-codes edited a comment on issue #5788: [POC] multi-threading using asyncio URL: https://github.com/apache/airflow/pull/5788#issuecomment-570269996 I'm not opposed to this PR/POC if it works and the scope/intention is clear; are there additional unit tests and functional tests to demonstrate that it works as it is now? I don't know enough about how it performs and how it manages the event loop and how the loop manages subprocesses - it seems like subprocesses cannot yield from (or await) to return control back to the main loop from anywhere within the `command` (i.e. the command itself cannot use asyncio). In the bigger picture, which might or might not pertain to this PR in particular, I want to understand a bit more about the Airflow task ecosystem and how asyncio/coroutines fit into an existing workflow or require an alternative workflow ecosystem (additional notes are in AIP-28). A key part of the architecture is whether an event loop belongs on an executor and/or on workers; AFAIK, an executor could manage an event loop and all the "tasks" that it runs need to be compatible with asyncio/coroutine behavior (and non-blocking libraries need to be used for db access etc). Note that the branch name for this PR might have started out with an intention to apply asyncio to an executor, but it winds up applying at the worker level, so the executor is not managing an event loop that all the workers share. There could be something important in that. In this PR, the primary point of interest and/or confusion is in ``` foo = await asyncio.create_subprocess_exec( *command, stdin=asyncio.subprocess.PIPE, stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE, loop=self.loop ) ``` vs. using an awaitable coroutine or task as in the sequence diagram noted in - https://docs.python.org/3.6/library/asyncio-task.html#example-chain-coroutines - see also https://docs.python.org/3.6/library/asyncio-dev.html#asyncio-multithreading By using a subprocess, does that mean the `command` itself has no ability to use asyncio coroutines - is that right? That's not entirely a bad thing if this PR intends to do so (or that's just how it has to be because the `command` ecosystem is not compatible with asyncio), but if that's correct, then it could mean that this worker could spawn a lot of processes that are all blocking within each process. I just wonder whether it really differs from using multiprocessing in this regard. Should this worker use some kind of asyncio process pool (or is that implicit already)?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
