Ahmet Altay created BEAM-4662:
---------------------------------

             Summary: WriteToBigQuery fails when it tries to create an already 
existing table
                 Key: BEAM-4662
                 URL: https://issues.apache.org/jira/browse/BEAM-4662
             Project: Beam
          Issue Type: Improvement
          Components: sdk-py-core
            Reporter: Ahmet Altay
            Assignee: Chamikara Jayalath


This call:

[https://github.com/apache/beam/blob/375bd3a6a53ba3ba7c965278dcb322875e1b4dca/sdks/python/apache_beam/io/gcp/bigquery.py#L1259]

to get_or_create_table will check if the the table exists and will try to 
create if it does not. However after the check the table might be created by 
another worker doing the exact same thing. The follow up create should not fail 
in these cases.

Results in the following error log although it retries the bundle and does not 
cause a permanent fail.

 

Caused by: java.lang.RuntimeException: Error received from SDK harness for 
instruction -374: Traceback (most recent call last): File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
 line 127, in _execute response = task() File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
 line 162, in <lambda> self._execute(lambda: worker.do_instruction(work), work) 
File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
 line 208, in do_instruction request.instruction_id) File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
 line 230, in process_bundle processor.process_bundle(instruction_id) File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
 line 289, in process_bundle op.start() File 
"apache_beam/runners/worker/operations.py", line 351, in 
apache_beam.runners.worker.operations.DoOperation.start def start(self): File 
"apache_beam/runners/worker/operations.py", line 352, in 
apache_beam.runners.worker.operations.DoOperation.start with 
self.scoped_start_state: File "apache_beam/runners/worker/operations.py", line 
396, in apache_beam.runners.worker.operations.DoOperation.start 
self.dofn_runner.start() File "apache_beam/runners/common.py", line 595, in 
apache_beam.runners.common.DoFnRunner.start 
self._invoke_bundle_method(self.do_fn_invoker.invoke_start_bundle) File 
"apache_beam/runners/common.py", line 589, in 
apache_beam.runners.common.DoFnRunner._invoke_bundle_method 
self._reraise_augmented(exn) File "apache_beam/runners/common.py", line 602, in 
apache_beam.runners.common.DoFnRunner._reraise_augmented raise File 
"apache_beam/runners/common.py", line 587, in 
apache_beam.runners.common.DoFnRunner._invoke_bundle_method bundle_method() 
File "apache_beam/runners/common.py", line 293, in 
apache_beam.runners.common.DoFnInvoker.invoke_start_bundle def 
invoke_start_bundle(self): File "apache_beam/runners/common.py", line 297, in 
apache_beam.runners.common.DoFnInvoker.invoke_start_bundle 
self.signature.start_bundle_method.method_value()) File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py", line 
1261, in start_bundle self.create_disposition, self.write_disposition) File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py", line 184, 
in wrapper return fun(*args, **kwargs) File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py", line 
1040, in get_or_create_table schema=schema or found_table.schema) File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py", line 
851, in _create_table response = self.client.tables.Insert(request) File 
"/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_client.py",
 line 621, in Insert config, request, global_params=global_params) File 
"/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 
722, in _RunMethod return self.ProcessHttpResponse(method_config, 
http_response, request) File 
"/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 
728, in ProcessHttpResponse self.__ProcessHttpResponse(method_config, 
http_response, request)) File 
"/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 
599, in __ProcessHttpResponse http_response, method_config=method_config, 
request=request) HttpConflictError: HttpError accessing 
<https://www.googleapis.com/bigquery/v2/projects/google.com%3Adeft-testing-integration/datasets/mobilegame_dataset_062708255728400/tables?alt=json>:
 response: <\{'status': '409', 'content-length': '366', 'x-xss-protection': '1; 
mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding': 
'chunked', 'expires': 'Wed, 27 Jun 2018 15:32:35 GMT', 'vary': 'Origin, 
X-Origin', 'server': 'GSE', '-content-encoding': 'gzip', 'cache-control': 
'private, max-age=0', 'date': 'Wed, 27 Jun 2018 15:32:35 GMT', 
'x-frame-options': 'SAMEORIGIN', 'content-type': 'application/json; 
charset=UTF-8'}>, content <\{ "error": { "errors": [ { "domain": "global", 
"reason": "duplicate", "message": "Already Exists: Table 
google.com:deft-testing-integration:mobilegame_dataset_062708255728400.game_stats_teams"
 } ], "code": 409, "message": "Already Exists: Table 
google.com:deft-testing-integration:mobilegame_dataset_062708255728400.game_stats_teams"

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to