pmcquighan-camus commented on issue #57359:
URL: https://github.com/apache/airflow/issues/57359#issuecomment-3618389257

   This can happen from any dag using the `DataflowStartFlexTemplateOperator` 
(possibly only when in deferrable mode?).  The log from 
`TemplateJobStartTrigger` is from the triggerer while waiting for the job 
(launched by `DataflowStartFlexTemplateOperator`) to complete.   The triggerer 
marks the task as failed, and then on retries the 
`DataflowStartFlexTemplateOperator` tries to resume executing, sees that the 
job was marked as failed and dies again.
   
   A sample task is defined like this, but it's not super useful without having 
a flex template defined in your GCP project:
   ```
   DataflowStartFlexTemplateOperator(
           task_id="mytask",
           body={
               "launchParameter": {
                   "containerSpecGcsPath": "gs://<some gcs bucket>/templates/<a 
flex template>",  # Need a dataflow flex template defined
                   "environment": {}, # Any job-specific parameters needed here 
like workerRegion
                   "jobName": "sample-job",
                   "parameters": {},  # Any params here
               },
           },
           location="<location>",
           project_id="<project id>",
           deferrable=True,
           append_job_name=True,  # Add unique suffix to job names, so retries 
on a file will create unique names
       )
   ```
   
   Since this failure case depends on GCP throwing 503's it's not very easy to 
replicate.  The trigger catches the 503 exception and sets a TriggerEvent of an 
error 
[here](https://github.com/apache/airflow/blob/3.1.0/providers/google/src/airflow/providers/google/cloud/triggers/dataflow.py#L113-L150),
 which is where doing something like check the exception for a 503 from the 
service provider, and if that's the case, continue looping like it does when 
the job [is still 
running](https://github.com/apache/airflow/blob/3.1.0/providers/google/src/airflow/providers/google/cloud/triggers/dataflow.py#L144)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to