Ahmet Altay created BEAM-4662:
---------------------------------
Summary: WriteToBigQuery fails when it tries to create an already
existing table
Key: BEAM-4662
URL: https://issues.apache.org/jira/browse/BEAM-4662
Project: Beam
Issue Type: Improvement
Components: sdk-py-core
Reporter: Ahmet Altay
Assignee: Chamikara Jayalath
This call:
[https://github.com/apache/beam/blob/375bd3a6a53ba3ba7c965278dcb322875e1b4dca/sdks/python/apache_beam/io/gcp/bigquery.py#L1259]
to get_or_create_table will check if the the table exists and will try to
create if it does not. However after the check the table might be created by
another worker doing the exact same thing. The follow up create should not fail
in these cases.
Results in the following error log although it retries the bundle and does not
cause a permanent fail.
Caused by: java.lang.RuntimeException: Error received from SDK harness for
instruction -374: Traceback (most recent call last): File
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
line 127, in _execute response = task() File
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
line 162, in <lambda> self._execute(lambda: worker.do_instruction(work), work)
File
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
line 208, in do_instruction request.instruction_id) File
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
line 230, in process_bundle processor.process_bundle(instruction_id) File
"/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
line 289, in process_bundle op.start() File
"apache_beam/runners/worker/operations.py", line 351, in
apache_beam.runners.worker.operations.DoOperation.start def start(self): File
"apache_beam/runners/worker/operations.py", line 352, in
apache_beam.runners.worker.operations.DoOperation.start with
self.scoped_start_state: File "apache_beam/runners/worker/operations.py", line
396, in apache_beam.runners.worker.operations.DoOperation.start
self.dofn_runner.start() File "apache_beam/runners/common.py", line 595, in
apache_beam.runners.common.DoFnRunner.start
self._invoke_bundle_method(self.do_fn_invoker.invoke_start_bundle) File
"apache_beam/runners/common.py", line 589, in
apache_beam.runners.common.DoFnRunner._invoke_bundle_method
self._reraise_augmented(exn) File "apache_beam/runners/common.py", line 602, in
apache_beam.runners.common.DoFnRunner._reraise_augmented raise File
"apache_beam/runners/common.py", line 587, in
apache_beam.runners.common.DoFnRunner._invoke_bundle_method bundle_method()
File "apache_beam/runners/common.py", line 293, in
apache_beam.runners.common.DoFnInvoker.invoke_start_bundle def
invoke_start_bundle(self): File "apache_beam/runners/common.py", line 297, in
apache_beam.runners.common.DoFnInvoker.invoke_start_bundle
self.signature.start_bundle_method.method_value()) File
"/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py", line
1261, in start_bundle self.create_disposition, self.write_disposition) File
"/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py", line 184,
in wrapper return fun(*args, **kwargs) File
"/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py", line
1040, in get_or_create_table schema=schema or found_table.schema) File
"/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py", line
851, in _create_table response = self.client.tables.Insert(request) File
"/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_client.py",
line 621, in Insert config, request, global_params=global_params) File
"/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line
722, in _RunMethod return self.ProcessHttpResponse(method_config,
http_response, request) File
"/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line
728, in ProcessHttpResponse self.__ProcessHttpResponse(method_config,
http_response, request)) File
"/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line
599, in __ProcessHttpResponse http_response, method_config=method_config,
request=request) HttpConflictError: HttpError accessing
<https://www.googleapis.com/bigquery/v2/projects/google.com%3Adeft-testing-integration/datasets/mobilegame_dataset_062708255728400/tables?alt=json>:
response: <\{'status': '409', 'content-length': '366', 'x-xss-protection': '1;
mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding':
'chunked', 'expires': 'Wed, 27 Jun 2018 15:32:35 GMT', 'vary': 'Origin,
X-Origin', 'server': 'GSE', '-content-encoding': 'gzip', 'cache-control':
'private, max-age=0', 'date': 'Wed, 27 Jun 2018 15:32:35 GMT',
'x-frame-options': 'SAMEORIGIN', 'content-type': 'application/json;
charset=UTF-8'}>, content <\{ "error": { "errors": [ { "domain": "global",
"reason": "duplicate", "message": "Already Exists: Table
google.com:deft-testing-integration:mobilegame_dataset_062708255728400.game_stats_teams"
} ], "code": 409, "message": "Already Exists: Table
google.com:deft-testing-integration:mobilegame_dataset_062708255728400.game_stats_teams"
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)