zstrathe commented on code in PR #38716: URL: https://github.com/apache/airflow/pull/38716#discussion_r1555982325
########## airflow/providers/apache/beam/operators/beam.py: ########## @@ -364,11 +364,13 @@ def execute(self, context: Context): def execute_sync(self, context: Context): with ExitStack() as exit_stack: - gcs_hook = GCSHook(gcp_conn_id=self.gcp_conn_id) if self.py_file.lower().startswith("gs://"): + gcs_hook = GCSHook(gcp_conn_id=self.gcp_conn_id) tmp_gcs_file = exit_stack.enter_context(gcs_hook.provide_file(object_url=self.py_file)) self.py_file = tmp_gcs_file.name if self.snake_case_pipeline_options.get("requirements_file", "").startswith("gs://"): + if 'gcs_hook' not in locals(): Review Comment: @e-galan thanks for the feedback! I will go ahead and remove that check then. In addition, after looking through the code some more, it looks like any other GCS resources may not be instantiated if supplied as pipeline_options? For example, the below from tests/system/providers/apache/beam/example_python.py: ``` start_python_pipeline_direct_runner = BeamRunPythonPipelineOperator( task_id="start_python_pipeline_direct_runner", py_file=GCS_PYTHON, py_options=[], pipeline_options={"output": GCS_OUTPUT}, py_requirements=["apache-beam[gcp]==2.46.0"], py_interpreter="python3", py_system_site_packages=False, ) ``` Would ```pipeline_options={"output": GCS_OUTPUT}``` correctly result in the output file being updated in GCS? If not, I think that somehow every value in pipeline_options should be recursively parsed for conversion to a GCS resource, with the complication that "output" files would need to utilize ```GCSHook.provide_file_and_upload()``` instead of ```GCSHook.provide_file()```. And I think that could be solved by adding another file prefix to distinguish GCS resources that need to be uploaded (i.e., ```"gcs-upload://"```), and updating the docs to note that. If this all sounds correct, I'd be happy to add to this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org