vksunilk commented on PR #29346:
URL: https://github.com/apache/airflow/pull/29346#issuecomment-1416017293

   > I think the purpose of `GCSObjectExistenceSensor` is to wait for a file to 
present. Why do we need these changes? It does not fail for me if the file is 
not present it waits for the file to get created and then timeout based on my 
`poke_interval` and `timeout` param.
   > 
   > ```
   > [2023-02-03, 15:06:41 UTC] {taskinstance.py:1524} INFO - Exporting env 
vars: AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='bq_check_op' 
AIRFLOW_CTX_TASK_ID='gcs_object_exists_task' 
AIRFLOW_CTX_EXECUTION_DATE='2023-02-03T15:06:08.323807+00:00' 
AIRFLOW_CTX_TRY_NUMBER='1' 
AIRFLOW_CTX_DAG_RUN_ID='manual__2023-02-03T15:06:08.323807+00:00'
   > [2023-02-03, 15:06:41 UTC] {gcs.py:94} INFO - Sensor checks existence of : 
test-gcs-bucket-providers, example_gcs.py
   > [2023-02-03, 15:06:41 UTC] {base.py:73} INFO - Using connection ID 
'google_cloud_default' for task execution.
   > [2023-02-03, 15:06:48 UTC] {gcs.py:94} INFO - Sensor checks existence of : 
test-gcs-bucket-providers, example_gcs.py
   > [2023-02-03, 15:06:48 UTC] {base.py:73} INFO - Using connection ID 
'google_cloud_default' for task execution.
   > [2023-02-03, 15:06:56 UTC] {gcs.py:94} INFO - Sensor checks existence of : 
test-gcs-bucket-providers, example_gcs.py
   > [2023-02-03, 15:06:56 UTC] {base.py:73} INFO - Using connection ID 
'google_cloud_default' for task execution.
   > [2023-02-03, 15:07:03 UTC] {gcs.py:94} INFO - Sensor checks existence of : 
test-gcs-bucket-providers, example_gcs.py
   > [2023-02-03, 15:07:03 UTC] {base.py:73} INFO - Using connection ID 
'google_cloud_default' for task execution.
   > [2023-02-03, 15:07:05 UTC] {taskinstance.py:1798} ERROR - Task failed with 
exception
   > Traceback (most recent call last):
   >   File "/opt/airflow/airflow/sensors/base.py", line 216, in execute
   >     raise AirflowSensorTimeout(message)
   > airflow.exceptions.AirflowSensorTimeout: Sensor has timed out; run 
duration of 24.36089755300054 seconds exceeds the specified timeout of 20.
   > [2023-02-03, 15:07:05 UTC] {taskinstance.py:1338} INFO - Immediate failure 
requested. Marking task as FAILED. dag_id=bq_check_op, 
task_id=gcs_object_exists_task, execution_date=20230203T150608, 
start_date=20230203T150640, end_date=20230203T150705
   > [2023-02-03, 15:07:05 UTC] {standard_task_runner.py:105} ERROR - Failed to 
execute job 147 for task gcs_object_exists_task (Sensor has timed out; run 
duration of 24.36089755300054 seconds exceeds the specified timeout of 20.; 
5779)
   > [2023-02-03, 15:07:05 UTC] {local_task_job.py:215} INFO - Task exited with 
return code 1
   > [2023-02-03, 15:07:05 UTC] {taskinstance.py:2616} INFO - 0 downstream 
tasks scheduled from follow-on schedule check
   > ```
   
   Yes. Incase, the user needs a case where he needs to wait for the file to be 
deleted by an external task. This can be useful. This is one such usecase.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to