uranusjr commented on issue #15286:
URL: https://github.com/apache/airflow/issues/15286#issuecomment-818866373


   I thought about this a bit and feel there are two things here to consider. 
The first is the overhead for `PythonVirtualenvOperator` to populate the 
virtuale environment, which (as mentioned above) should be solved by 
introducing some caching mechanism, something similar to how CI caches stuff 
between runs. This is very much worth doing.
   
   There is another use case surrounding `PythonVirtualenvOperator`, 
however—people wanting more control over the environment used to run Python 
code. Maybe there are some dependencies that can’t be covered by Python 
packaging, or require special configuration of the environment. Or maybe the 
user is simply migrating from an existing cron setup and want to reuse the 
environments first to avoid re-writing everything all at once. Currently people 
would need to “drop down” to `BashOperator` to achieve this, and while that 
definitely works, kind of “wastes” the knowledge the operator is running 
Python, and prevents nice things we can do with that knowledge.
   
   I think two solutions are needed for the two problems. The first is probably 
more intuitive to design; we can add caching options to 
`PythonVirtualenvOperator` to make Airflow cache and reuse the environment (or 
a subset of it); we can steal some ideas from CI designs for this. The other is 
less straightforward; my current idea is to introduce a 
`ExternalPythonOperator` (please recommend better names) that, instead of 
taking a requirement to create a virtual environment from, simply takes a path 
to a Python executable to run the Python callable with. The behaviour would 
otherwise be very similar to `PythonVirtualenvOperator`, including all the code 
generation and pickling caveats. This would be much easier to implement than 
the caching one (which, also mentioned above, requires tricky considerations 
with parallelism). So I’ll probably start with it and see what I can do.
   
   Any advices are very welcomed!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to