jens-scheffler-bosch opened a new pull request, #33017:
URL: https://github.com/apache/airflow/pull/33017

   This PR adds the option to specify extra index URLs to 
PythonVirtualEnvOperator (+corresponding decorator) in order to be able to 
install virtualenvs with (private) additional Python package repositories.
   
   Besides the option to modify the worker config or adding ENVs to specify 
additional PIP install configs, this allows to use different back-ends per 
DAG/Task.
   
   As the setup of virtualsenvs is usually time-consuming and needs a lot of 
copy IO, this degrades performance of execution if high volume tasks are 
called. Therefore a cache option was added to re-use created virtualenvs across 
executions.
   
   As the definition of (private) extra index URLs might require credentials to 
be passed, using plain requirements or pip install options would expose 
passwords as secrets in logs. Therefore the Airflow core provided connection 
types were extended to contain a "Package Index (Python)" connection type that 
can be used to store the credentials in Airflow in a (more) secret way and 
preventing to expose. As I realized that Airflow core provided hooks are 
treated as "second class citizens" I re-worked the `plugin_manager.py` and 
properly modeled the existing FS hook to provide correct form fields like other 
providers.
   
   Summary of changes, separated by commit:
   - Re-worked provider manager to treat Airflow core hooks like other provider 
hooks, removed previous hard coded core type lists from CLI and web UI, 
re-worked FS Hook to have a proper form: 
fc71e6322eddc1c66082ceded2fb28134c0abeb1
   - Added a Package Index (Python) hook in 
55e5082c370bcca9028c2c63bedbfa91fd1bc3cd
   - Extended PythonVirtualEnvOperator for extra index URL and caching in 
3eade80b040cd10427e461d0501f28fc3d1d6fa1
   
   Note: I realized that the existing hooks still carry a `mesos_framework-id` 
hook type which seems to be a legacy/leftover/dead code from former Mesos 
Executor support which has been dropped with Airflow 2.0. I found no reference 
to this so a potential addition to this PR could be to clean this up (finally).
   
   How and what to test:
   - Pipeline green obviously
   - Open the Web UI connections form and see that "File (Path)" connection 
type now has a proper form
   - Open the Web UI connections form and see that a new connection type 
"Package Index (Python)" is available (with proper form)
   - Take a look to docs and check with a DAG, e.g. use example_python_operator 
and modify some parameters and try the new options
   - Review the added RST docs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to