zstrathe commented on code in PR #38716:
URL: https://github.com/apache/airflow/pull/38716#discussion_r1555982325


##########
airflow/providers/apache/beam/operators/beam.py:
##########
@@ -364,11 +364,13 @@ def execute(self, context: Context):
 
     def execute_sync(self, context: Context):
         with ExitStack() as exit_stack:
-            gcs_hook = GCSHook(gcp_conn_id=self.gcp_conn_id)
             if self.py_file.lower().startswith("gs://"):
+                gcs_hook = GCSHook(gcp_conn_id=self.gcp_conn_id)
                 tmp_gcs_file = 
exit_stack.enter_context(gcs_hook.provide_file(object_url=self.py_file))
                 self.py_file = tmp_gcs_file.name
             if self.snake_case_pipeline_options.get("requirements_file", 
"").startswith("gs://"):
+                if 'gcs_hook' not in locals():

Review Comment:
   @e-galan thanks for the feedback! I will go ahead and remove that check then.
   
   In addition, after looking through the code some more, it looks like any 
other GCS resources may not be instantiated if supplied as pipeline_options? 
For example, the below from 
tests/system/providers/apache/beam/example_python.py:
   ```
   start_python_pipeline_direct_runner = BeamRunPythonPipelineOperator(
       task_id="start_python_pipeline_direct_runner",
       py_file=GCS_PYTHON,
       py_options=[],
       pipeline_options={"output": GCS_OUTPUT},
       py_requirements=["apache-beam[gcp]==2.46.0"],
       py_interpreter="python3",
       py_system_site_packages=False,
   )
   ```
   Would  ```pipeline_options={"output": GCS_OUTPUT}``` correctly result in the 
output file being updated in GCS? 
   
   If not, I think that somehow every value in pipeline_options should be 
recursively parsed for conversion to a GCS resource, with the complication that 
"output" files would need to utilize ```GCSHook.provide_file_and_upload()``` 
instead of ```GCSHook.provide_file()```. And I think that could be solved by 
adding another file prefix to distinguish GCS resources that need to be 
uploaded (i.e., ```"gcs-upload://"```), and updating the docs to note that.
   
   If this all sounds correct, I'd be happy to add to this PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to