hussein-awala commented on issue #31147:
URL: https://github.com/apache/airflow/issues/31147#issuecomment-1554239929
By reading the task log, we can say that there there could be a default
timeout of 2 hours in the BigQuery client, which is not mentioned in the client
doc.
Since your job duration is less than 3 hours, can you try to set the
result_timeout parameter to 3h?
```python
insert_data = BigQueryInsertParametrizedJobOperator(
task_id=CommonTaskIds.INSERT_DATA_AND_UPDATE_VALID_DATES,
task_group=self.task_group,
gcp_conn_id=self.gcp_conn_id,
force_rerun=False,
reattach_states={"PENDING", "RUNNING", "DONE"},
job_id=kwargs.pop("job_id"),
configuration=config,
pool=self.pool,
trigger_rule=TriggerRule.NONE_FAILED_MIN_ONE_SUCCESS,
result_timeout=3*60*60,
**kwargs,
)
```
An alternative solution can be providing a retry strategy through the
parameter `result_retry`, where the default strategy doesn't retry the query
for timeout exception.
Source code from GCP doc:
```python
def _should_retry(exc):
"""Predicate for determining when to retry.
We retry if and only if the 'reason' is 'backendError'
or 'rateLimitExceeded'.
"""
if not hasattr(exc, 'errors'):
return False
if len(exc.errors) == 0:
return False
reason = exc.errors[0]['reason']
return reason == 'backendError' or reason == 'rateLimitExceeded'
DEFAULT_RETRY = retry.Retry(predicate=_should_retry)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]