shahar1 commented on code in PR #44279:
URL: https://github.com/apache/airflow/pull/44279#discussion_r1863373235
##########
providers/src/airflow/providers/google/cloud/operators/bigquery.py:
##########
@@ -2592,8 +2592,15 @@ def _submit_job(
nowait=True,
)
- @staticmethod
- def _handle_job_error(job: BigQueryJob | UnknownJob) -> None:
+ def _handle_job_error(self, job: BigQueryJob | UnknownJob) -> None:
+ self.log.info("Job %s is completed. Checking the job status",
self.job_id)
+ # I've noticed that sometimes BigQuery jobs transiently report the
wrong status, causing the Airflow job to be incorrectly marked as successful.
+ # To avoid this, we refresh the job properties before checking the
final state and handling any errors.
+ while job.state != "DONE":
Review Comment:
Hey @pankajastro and thanks for putting more details!
Indeed, according to BigQuery docs - the lack of `error_result` could be
considered as an expected behavior:
```
The table does not include all possible HTTP errors or other networking
errors. Therefore, don't assume that an error object is present in every error
response from BigQuery.
```
Therefore, handling it with an additional `if job.state != "DONE"` could be
sufficient for now to ensure that the operator does not report errorneous
results.
Applying the `while` loop on the other hand, as currently implmented in
@kandharvishnu 's
[PR](https://github.com/apache/airflow/blame/8fdf29d39dd315f3d7144b4bba66f348221292e6/providers/src/airflow/providers/google/cloud/operators/bigquery.py#L2688),
exceeds the atomicity the operation (we already called `job.result()` that
should provides a final answer - why do we need to recall it?) - which makes us
in this case, Airflow maintainers, responsible for handling BQ transient
errors. I assume that we don't want to get there.
To sum it up, I'm ok with merging this solution as-is, with Elad's consent.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]