[GitHub] [airflow] dazza-codes edited a comment on issue #5788: [POC] multi-threading using asyncio

GitBox Thu, 02 Jan 2020 11:06:05 -0800

dazza-codes edited a comment on issue #5788: [POC] multi-threading using asyncio
URL: https://github.com/apache/airflow/pull/5788#issuecomment-570269996
 
 
   I'm not opposed to this PR/POC if it works and the scope/intention is clear; 
are there additional unit tests and functional tests to demonstrate that it 
works as it is now?
   
   I don't know enough about how it performs and how it manages the event loop 
and how the loop manages subprocesses - it seems like subprocesses cannot yield 
from (or await) to return control back to the main loop from anywhere within 
the `command` (i.e. the command itself cannot use asyncio).
   
   In the bigger picture, which might or might not pertain to this PR in 
particular, I want to understand a bit more about the Airflow task ecosystem 
and how asyncio/coroutines fit into an existing workflow or require an 
alternative workflow ecosystem (additional notes are in AIP-28).  A key part of 
the architecture is whether an event loop belongs on an executor and/or on 
workers; AFAIK, an executor could manage an event loop and all the "tasks" that 
it runs need to be compatible with asyncio/coroutine behavior (and non-blocking 
libraries need to be used for db access etc).  Note that the branch name for 
this PR might have started out with an intention to apply asyncio to an 
executor, but it winds up applying at the worker level, so the executor is not 
managing an event loop that all the workers share.  There could be something 
important in that.
   
   In this PR, the primary point of interest and/or confusion is in
   ```
   foo = await asyncio.create_subprocess_exec(
                   *command,
                   stdin=asyncio.subprocess.PIPE,
                   stdout=asyncio.subprocess.PIPE,
                   stderr=asyncio.subprocess.PIPE,
                   loop=self.loop
               )
   ```
   vs. using an awaitable coroutine or task as in the sequence diagram noted in
   - 
https://docs.python.org/3.6/library/asyncio-task.html#example-chain-coroutines
   - see also 
https://docs.python.org/3.6/library/asyncio-dev.html#asyncio-multithreading
   
   By using a subprocess, does that mean the `command` itself has no ability to 
use asyncio coroutines - is that right?  That's not entirely a bad thing if 
this PR intends to do so (or that's just how it has to be because the `command` 
ecosystem is not compatible with asyncio), but if that's correct, then it could 
mean that this worker could spawn a lot of processes that are all blocking 
within each process.  I just wonder whether it really differs from using 
multiprocessing in this regard.  Should this worker use some kind of asyncio 
process pool (or is that implicit already)?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [airflow] dazza-codes edited a comment on issue #5788: [POC] multi-threading using asyncio

Reply via email to