Paul Woods created AIRFLOW-2769:
-----------------------------------
Summary: Increase num_retries polling value on Dataflow hook
Key: AIRFLOW-2769
URL: https://issues.apache.org/jira/browse/AIRFLOW-2769
Project: Apache Airflow
Issue Type: Bug
Components: contrib, Dataflow
Affects Versions: 1.10
Reporter: Paul Woods
*Problem Description*
When airflow launches a Job in Dataflow, it polls the GCP api for job status
until the job is complete or fails. The GCP API occasionally returns 500 and
429 errors on these API requests, which causes the airflow task to fail
intermittently, particularly for long-running tasks, while the dataflow job
itself does not terminate.
The recommended action is to retry the request with exponential backoff
([https://developers.google.com/drive/api/v3/handle-errors)]. The gcp api
provides this service via the `num_retries` parameter on execute(), but that
parameter is not used in
{code:java}
airflow.contrib.hooks.gcp_dataflow_hook{code}
*Proposed Solution*
Add num_retries to the execute() calls in
{code:java}
_DataflowJob._get_job_id_from_name{code}
and _
{code:java}
_DataflowJob._get_job{code}
*NOTE:* the same problem was addressed for Dataproc in
([https://issues.apache.org/jira/browse/AIRFLOW-1718)]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)