chamikaramj commented on a change in pull request #14690:
URL: https://github.com/apache/beam/pull/14690#discussion_r640771695



##########
File path: sdks/python/apache_beam/io/gcp/bigquery.py
##########
@@ -1312,10 +1312,11 @@ def _flush_batch(self, destination):
           skip_invalid_rows=True)
       self.batch_latency_metric.update((time.time() - start) * 1000)
 
-      failed_rows = [rows[entry.index] for entry in errors]
+      failed_rows = [rows[entry['index']] for entry in errors]

Review comment:
       Is this due to the API change ?

##########
File path: sdks/python/apache_beam/io/gcp/bigquery_test.py
##########
@@ -950,12 +943,10 @@ def store_callback(arg):
         with open(file_name_2, 'w') as f:
           json.dump(json_output, f)
 
-      res = mock.Mock()
-      res.insertErrors = []

Review comment:
       Why the change to return value ?

##########
File path: sdks/python/apache_beam/io/gcp/bigquery_test.py
##########
@@ -809,15 +808,14 @@ def test_dofn_client_process_performs_batching(self):
     fn.process(('project_id:dataset_id.table_id', {'month': 1}))
 
     # InsertRows not called as batch size is not hit yet
-    self.assertFalse(client.tabledata.InsertAll.called)
+    self.assertFalse(client.insert_rows_json.called)

Review comment:
       Do you know if the API support other formats (for example, Avro) that 
are more efficient ?

##########
File path: sdks/python/apache_beam/io/gcp/bigquery_tools.py
##########
@@ -632,22 +639,21 @@ def _insert_all_rows(
         base_labels=labels)
 
     started_millis = int(time.time() * 1000)
-    response = None
     try:
-      response = self.client.tabledata.InsertAll(request)
-      if not response.insertErrors:
+      table_ref = gcp_bigquery.DatasetReference(project_id,
+                                                dataset_id).table(table_id)
+      errors = self.gcp_bq_client.insert_rows_json(
+          table_ref, json_rows=rows, row_ids=insert_ids, 
skip_invalid_rows=True)
+      if not errors:
         service_call_metric.call('ok')
-      for insert_error in response.insertErrors:
-        for error in insert_error.errors:
-          service_call_metric.call(error.reason)
+      for insert_error in errors:
+        service_call_metric.call(insert_error['errors'][0])
     except HttpError as e:

Review comment:
       You probably have to rebase here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to