[GitHub] [beam] ahmedabu98 commented on a diff in pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

via GitHub Mon, 06 Feb 2023 05:53:43 -0800


ahmedabu98 commented on code in PR #25325:
URL: https://github.com/apache/beam/pull/25325#discussion_r1097324853



##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -405,6 +405,8 @@ def chain_after(result):
 from apache_beam.utils.annotations import deprecated
 from apache_beam.utils.annotations import experimental
 
+from google.api_core.exceptions import ClientError, GoogleAPICallError

Review Comment:
   Current tests are failing with `ModuleNotFoundError: No module named 
'google.api_core'`. This import should be in the try block on line 410



##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1553,23 @@ def _flush_batch(self, destination):
       insert_ids = [None for r in rows_and_insert_ids]
     else:
       insert_ids = [r[1] for r in rows_and_insert_ids]
-
     while True:
+      errors = []
+      passed = False
       start = time.time()
-      passed, errors = self.bigquery_wrapper.insert_rows(
-          project_id=table_reference.projectId,
-          dataset_id=table_reference.datasetId,
-          table_id=table_reference.tableId,
-          rows=rows,
-          insert_ids=insert_ids,
-          skip_invalid_rows=True,
-          ignore_unknown_values=self.ignore_unknown_columns)
+      try:
+        passed, errors = self.bigquery_wrapper.insert_rows(
+              project_id=table_reference.projectId,
+              dataset_id=table_reference.datasetId,
+              table_id=table_reference.tableId,
+              rows=rows,
+              insert_ids=insert_ids,
+              skip_invalid_rows=True,
+              ignore_unknown_values=self.ignore_unknown_columns)
+      except (ClientError, GoogleAPICallError) as e:
+        if e.code == 404:

Review Comment:
   Also maybe a comment here describing that sometimes a table can get deleted 
in the middle of a streaming job.



##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1553,23 @@ def _flush_batch(self, destination):
       insert_ids = [None for r in rows_and_insert_ids]
     else:
       insert_ids = [r[1] for r in rows_and_insert_ids]
-
     while True:
+      errors = []
+      passed = False
       start = time.time()
-      passed, errors = self.bigquery_wrapper.insert_rows(
-          project_id=table_reference.projectId,
-          dataset_id=table_reference.datasetId,
-          table_id=table_reference.tableId,
-          rows=rows,
-          insert_ids=insert_ids,
-          skip_invalid_rows=True,
-          ignore_unknown_values=self.ignore_unknown_columns)
+      try:
+        passed, errors = self.bigquery_wrapper.insert_rows(
+              project_id=table_reference.projectId,
+              dataset_id=table_reference.datasetId,
+              table_id=table_reference.tableId,
+              rows=rows,
+              insert_ids=insert_ids,
+              skip_invalid_rows=True,
+              ignore_unknown_values=self.ignore_unknown_columns)
+      except (ClientError, GoogleAPICallError) as e:
+        if e.code == 404:

Review Comment:
   Can you also add a helpful log message here explaining that the previously 
seen destination X no longer exists, so it will be removed from local cache and 
bundle will retry.



##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1553,23 @@ def _flush_batch(self, destination):
       insert_ids = [None for r in rows_and_insert_ids]
     else:
       insert_ids = [r[1] for r in rows_and_insert_ids]
-
     while True:
+      errors = []
+      passed = False
       start = time.time()
-      passed, errors = self.bigquery_wrapper.insert_rows(
-          project_id=table_reference.projectId,
-          dataset_id=table_reference.datasetId,
-          table_id=table_reference.tableId,
-          rows=rows,
-          insert_ids=insert_ids,
-          skip_invalid_rows=True,
-          ignore_unknown_values=self.ignore_unknown_columns)
+      try:
+        passed, errors = self.bigquery_wrapper.insert_rows(
+              project_id=table_reference.projectId,
+              dataset_id=table_reference.datasetId,
+              table_id=table_reference.tableId,
+              rows=rows,
+              insert_ids=insert_ids,
+              skip_invalid_rows=True,
+              ignore_unknown_values=self.ignore_unknown_columns)
+      except (ClientError, GoogleAPICallError) as e:
+        if e.code == 404:

Review Comment:
   ```suggestion
           if e.code == 404 and destination in  _KNOWN_TABLES:
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [beam] ahmedabu98 commented on a diff in pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Reply via email to