Jacob Ferriero created AIRFLOW-5520:
---------------------------------------

             Summary: DataflowPythonOperator dependency management requires 
side effects
                 Key: AIRFLOW-5520
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5520
             Project: Apache Airflow
          Issue Type: Improvement
          Components: gcp
    Affects Versions: 1.10.2
            Reporter: Jacob Ferriero


When using DataflowPythonOperator it is difficult to manage apache beam 
version, (and other python dependencies) without affecting your entire airflow 
environment. It seems the Dataflow hook just submits a subprocess and python 

The operator / hook should be improved to isolate python dependencies for 
running run py_file.

Perhaps this could be achieved in a virtual environment (similar to 
PythonVirtualEnvOperator).

For beam it's often customary to specify a --requirements_file or --setup_file 
to manage python dependencies, we could run one of these in the venv to get it 
setup. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to