vksunilk commented on PR #29346:
URL: https://github.com/apache/airflow/pull/29346#issuecomment-1416017293
> I think the purpose of `GCSObjectExistenceSensor` is to wait for a file to
present. Why do we need these changes? It does not fail for me if the file is
not present it waits for the file to get created and then timeout based on my
`poke_interval` and `timeout` param.
>
> ```
> [2023-02-03, 15:06:41 UTC] {taskinstance.py:1524} INFO - Exporting env
vars: AIRFLOW_CTX_DAG_OWNER='airflow' AIRFLOW_CTX_DAG_ID='bq_check_op'
AIRFLOW_CTX_TASK_ID='gcs_object_exists_task'
AIRFLOW_CTX_EXECUTION_DATE='2023-02-03T15:06:08.323807+00:00'
AIRFLOW_CTX_TRY_NUMBER='1'
AIRFLOW_CTX_DAG_RUN_ID='manual__2023-02-03T15:06:08.323807+00:00'
> [2023-02-03, 15:06:41 UTC] {gcs.py:94} INFO - Sensor checks existence of :
test-gcs-bucket-providers, example_gcs.py
> [2023-02-03, 15:06:41 UTC] {base.py:73} INFO - Using connection ID
'google_cloud_default' for task execution.
> [2023-02-03, 15:06:48 UTC] {gcs.py:94} INFO - Sensor checks existence of :
test-gcs-bucket-providers, example_gcs.py
> [2023-02-03, 15:06:48 UTC] {base.py:73} INFO - Using connection ID
'google_cloud_default' for task execution.
> [2023-02-03, 15:06:56 UTC] {gcs.py:94} INFO - Sensor checks existence of :
test-gcs-bucket-providers, example_gcs.py
> [2023-02-03, 15:06:56 UTC] {base.py:73} INFO - Using connection ID
'google_cloud_default' for task execution.
> [2023-02-03, 15:07:03 UTC] {gcs.py:94} INFO - Sensor checks existence of :
test-gcs-bucket-providers, example_gcs.py
> [2023-02-03, 15:07:03 UTC] {base.py:73} INFO - Using connection ID
'google_cloud_default' for task execution.
> [2023-02-03, 15:07:05 UTC] {taskinstance.py:1798} ERROR - Task failed with
exception
> Traceback (most recent call last):
> File "/opt/airflow/airflow/sensors/base.py", line 216, in execute
> raise AirflowSensorTimeout(message)
> airflow.exceptions.AirflowSensorTimeout: Sensor has timed out; run
duration of 24.36089755300054 seconds exceeds the specified timeout of 20.
> [2023-02-03, 15:07:05 UTC] {taskinstance.py:1338} INFO - Immediate failure
requested. Marking task as FAILED. dag_id=bq_check_op,
task_id=gcs_object_exists_task, execution_date=20230203T150608,
start_date=20230203T150640, end_date=20230203T150705
> [2023-02-03, 15:07:05 UTC] {standard_task_runner.py:105} ERROR - Failed to
execute job 147 for task gcs_object_exists_task (Sensor has timed out; run
duration of 24.36089755300054 seconds exceeds the specified timeout of 20.;
5779)
> [2023-02-03, 15:07:05 UTC] {local_task_job.py:215} INFO - Task exited with
return code 1
> [2023-02-03, 15:07:05 UTC] {taskinstance.py:2616} INFO - 0 downstream
tasks scheduled from follow-on schedule check
> ```
Yes. Incase, the user needs a case where he needs to wait for the file to be
deleted by an external task. This can be useful. This is one such usecase.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]