shahar1 commented on code in PR #44279:
URL: https://github.com/apache/airflow/pull/44279#discussion_r1863373235


##########
providers/src/airflow/providers/google/cloud/operators/bigquery.py:
##########
@@ -2592,8 +2592,15 @@ def _submit_job(
             nowait=True,
         )
 
-    @staticmethod
-    def _handle_job_error(job: BigQueryJob | UnknownJob) -> None:
+    def _handle_job_error(self, job: BigQueryJob | UnknownJob) -> None:
+        self.log.info("Job %s is completed. Checking the job status", 
self.job_id)
+        # I've noticed that sometimes BigQuery jobs transiently report the 
wrong status, causing the Airflow job to be incorrectly marked as successful.
+        # To avoid this, we refresh the job properties before checking the 
final state and handling any errors.
+        while job.state != "DONE":

Review Comment:
   Hey @pankajastro and thanks for putting more details!
   
   Indeed, according to BigQuery docs - the lack of `error_result` could be 
considered as an expected behavior:
   ```
   The table does not include all possible HTTP errors or other networking 
errors. Therefore, don't assume that an error object is present in every error 
response from BigQuery.
   ```
   Therefore, handling it with an additional `if job.state != "DONE"` could be 
sufficient for now to ensure that the operator does not report errorneous 
results.
   
   Applying the `while` loop on the other hand, as currently implmented in 
@kandharvishnu 's 
[PR](https://github.com/apache/airflow/blame/8fdf29d39dd315f3d7144b4bba66f348221292e6/providers/src/airflow/providers/google/cloud/operators/bigquery.py#L2688),
 exceeds the atomicity the operation (we already called `job.result()` that 
should provides a final answer - why do we need to recall it?). That makes us 
in this case, Airflow maintainers, responsible for handling BQ transient errors 
- I assume that we don't want to get there.
   
   To sum it up, I'm ok with merging this solution as-is, with Elad's consent.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to