Paul Woods created AIRFLOW-2769:
-----------------------------------

             Summary: Increase num_retries polling value on Dataflow hook
                 Key: AIRFLOW-2769
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2769
             Project: Apache Airflow
          Issue Type: Bug
          Components: contrib, Dataflow
    Affects Versions: 1.10
            Reporter: Paul Woods


*Problem Description*

When airflow launches a Job in Dataflow, it polls the GCP api for job status 
until the job is complete or fails.  The GCP API occasionally returns 500 and 
429  errors on these API requests, which causes the airflow task to fail 
intermittently, particularly for long-running tasks, while the dataflow job 
itself does not terminate.

The recommended action is to retry the request with exponential backoff 
([https://developers.google.com/drive/api/v3/handle-errors)].   The gcp api 
provides this service via the `num_retries` parameter on execute(), but that 
parameter is not used in
{code:java}
airflow.contrib.hooks.gcp_dataflow_hook{code}
*Proposed Solution*

Add num_retries to the execute() calls in 
{code:java}
_DataflowJob._get_job_id_from_name{code}
and _
{code:java}
_DataflowJob._get_job{code}
 

*NOTE:*  the same problem was addressed for Dataproc in 
([https://issues.apache.org/jira/browse/AIRFLOW-1718)]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to