jens-scheffler-bosch opened a new pull request, #33017: URL: https://github.com/apache/airflow/pull/33017
This PR adds the option to specify extra index URLs to PythonVirtualEnvOperator (+corresponding decorator) in order to be able to install virtualenvs with (private) additional Python package repositories. Besides the option to modify the worker config or adding ENVs to specify additional PIP install configs, this allows to use different back-ends per DAG/Task. As the setup of virtualsenvs is usually time-consuming and needs a lot of copy IO, this degrades performance of execution if high volume tasks are called. Therefore a cache option was added to re-use created virtualenvs across executions. As the definition of (private) extra index URLs might require credentials to be passed, using plain requirements or pip install options would expose passwords as secrets in logs. Therefore the Airflow core provided connection types were extended to contain a "Package Index (Python)" connection type that can be used to store the credentials in Airflow in a (more) secret way and preventing to expose. As I realized that Airflow core provided hooks are treated as "second class citizens" I re-worked the `plugin_manager.py` and properly modeled the existing FS hook to provide correct form fields like other providers. Summary of changes, separated by commit: - Re-worked provider manager to treat Airflow core hooks like other provider hooks, removed previous hard coded core type lists from CLI and web UI, re-worked FS Hook to have a proper form: fc71e6322eddc1c66082ceded2fb28134c0abeb1 - Added a Package Index (Python) hook in 55e5082c370bcca9028c2c63bedbfa91fd1bc3cd - Extended PythonVirtualEnvOperator for extra index URL and caching in 3eade80b040cd10427e461d0501f28fc3d1d6fa1 Note: I realized that the existing hooks still carry a `mesos_framework-id` hook type which seems to be a legacy/leftover/dead code from former Mesos Executor support which has been dropped with Airflow 2.0. I found no reference to this so a potential addition to this PR could be to clean this up (finally). How and what to test: - Pipeline green obviously - Open the Web UI connections form and see that "File (Path)" connection type now has a proper form - Open the Web UI connections form and see that a new connection type "Package Index (Python)" is available (with proper form) - Take a look to docs and check with a DAG, e.g. use example_python_operator and modify some parameters and try the new options - Review the added RST docs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
