jaketf commented on a change in pull request #6590: [AIRFLOW-5520] Add options
to run Dataflow in a virtual environment
URL: https://github.com/apache/airflow/pull/6590#discussion_r347600964
##########
File path: airflow/gcp/hooks/dataflow.py
##########
@@ -515,8 +530,20 @@ def label_formatter(labels_dict):
return ['--labels={}={}'.format(key, value)
for key, value in labels_dict.items()]
- self._start_dataflow(variables, name, [py_interpreter] + py_options +
[dataflow],
- label_formatter, project_id)
+ if py_requirements is not None:
+ with TemporaryDirectory(prefix='dataflow-venv') as tmp_dir:
+ py_interpreter = prepare_virtualenv(
Review comment:
Interesting. For Composer, it'd be useful to put these downloaded files in
the `/data/` directory on GCS which always gets synced to workers, and have the
install happen from there rather than always redownloading.
https://cloud.google.com/composer/docs/concepts/cloud-storage
Agree this may be composer specific optimization and not appropriate for OSS
trunk.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services