hussein-awala commented on issue #31147:
URL: https://github.com/apache/airflow/issues/31147#issuecomment-1554239929

   By reading the task log, we can say that there there could be a default 
timeout of 2 hours in the BigQuery client, which is not mentioned in the client 
doc.
   Since your job duration is less than 3 hours, can you try to set the 
result_timeout parameter to 3h?
   ```python
           insert_data = BigQueryInsertParametrizedJobOperator(
               task_id=CommonTaskIds.INSERT_DATA_AND_UPDATE_VALID_DATES,
               task_group=self.task_group,
               gcp_conn_id=self.gcp_conn_id,
               force_rerun=False,
               reattach_states={"PENDING", "RUNNING", "DONE"},
               job_id=kwargs.pop("job_id"),
               configuration=config,
               pool=self.pool,
               trigger_rule=TriggerRule.NONE_FAILED_MIN_ONE_SUCCESS,
               result_timeout=3*60*60,
               **kwargs,
           )
   ``` 
   An alternative solution can be providing a retry strategy through the 
parameter `result_retry`, where the default strategy doesn't retry the query 
for timeout exception.
   Source code from GCP doc:
   ```python
   def _should_retry(exc):
       """Predicate for determining when to retry.
   
       We retry if and only if the 'reason' is 'backendError'
       or 'rateLimitExceeded'.
       """
       if not hasattr(exc, 'errors'):
           return False
       if len(exc.errors) == 0:
           return False
       reason = exc.errors[0]['reason']
       return reason == 'backendError' or reason == 'rateLimitExceeded'
   
   
   DEFAULT_RETRY = retry.Retry(predicate=_should_retry)
   ```  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to