[
https://issues.apache.org/jira/browse/BEAM-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525438#comment-16525438
]
Ahmet Altay commented on BEAM-4662:
-----------------------------------
cc: [~mariagh]
> WriteToBigQuery fails when it tries to create an already existing table
> -----------------------------------------------------------------------
>
> Key: BEAM-4662
> URL: https://issues.apache.org/jira/browse/BEAM-4662
> Project: Beam
> Issue Type: Improvement
> Components: sdk-py-core
> Reporter: Ahmet Altay
> Assignee: Chamikara Jayalath
> Priority: Major
>
> This call:
> [https://github.com/apache/beam/blob/375bd3a6a53ba3ba7c965278dcb322875e1b4dca/sdks/python/apache_beam/io/gcp/bigquery.py#L1259]
> to get_or_create_table will check if the the table exists and will try to
> create if it does not. However after the check the table might be created by
> another worker doing the exact same thing. The follow up create should not
> fail in these cases.
> Results in the following error log although it retries the bundle and does
> not cause a permanent fail.
>
> Caused by: java.lang.RuntimeException: Error received from SDK harness for
> instruction -374: Traceback (most recent call last): File
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
> line 127, in _execute response = task() File
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
> line 162, in <lambda> self._execute(lambda: worker.do_instruction(work),
> work) File
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
> line 208, in do_instruction request.instruction_id) File
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
> line 230, in process_bundle processor.process_bundle(instruction_id) File
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
> line 289, in process_bundle op.start() File
> "apache_beam/runners/worker/operations.py", line 351, in
> apache_beam.runners.worker.operations.DoOperation.start def start(self): File
> "apache_beam/runners/worker/operations.py", line 352, in
> apache_beam.runners.worker.operations.DoOperation.start with
> self.scoped_start_state: File "apache_beam/runners/worker/operations.py",
> line 396, in apache_beam.runners.worker.operations.DoOperation.start
> self.dofn_runner.start() File "apache_beam/runners/common.py", line 595, in
> apache_beam.runners.common.DoFnRunner.start
> self._invoke_bundle_method(self.do_fn_invoker.invoke_start_bundle) File
> "apache_beam/runners/common.py", line 589, in
> apache_beam.runners.common.DoFnRunner._invoke_bundle_method
> self._reraise_augmented(exn) File "apache_beam/runners/common.py", line 602,
> in apache_beam.runners.common.DoFnRunner._reraise_augmented raise File
> "apache_beam/runners/common.py", line 587, in
> apache_beam.runners.common.DoFnRunner._invoke_bundle_method bundle_method()
> File "apache_beam/runners/common.py", line 293, in
> apache_beam.runners.common.DoFnInvoker.invoke_start_bundle def
> invoke_start_bundle(self): File "apache_beam/runners/common.py", line 297, in
> apache_beam.runners.common.DoFnInvoker.invoke_start_bundle
> self.signature.start_bundle_method.method_value()) File
> "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py", line
> 1261, in start_bundle self.create_disposition, self.write_disposition) File
> "/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py", line
> 184, in wrapper return fun(*args, **kwargs) File
> "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py", line
> 1040, in get_or_create_table schema=schema or found_table.schema) File
> "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py", line
> 851, in _create_table response = self.client.tables.Insert(request) File
> "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_client.py",
> line 621, in Insert config, request, global_params=global_params) File
> "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line
> 722, in _RunMethod return self.ProcessHttpResponse(method_config,
> http_response, request) File
> "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line
> 728, in ProcessHttpResponse self.__ProcessHttpResponse(method_config,
> http_response, request)) File
> "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line
> 599, in __ProcessHttpResponse http_response, method_config=method_config,
> request=request) HttpConflictError: HttpError accessing
> <https://www.googleapis.com/bigquery/v2/projects/google.com%3Adeft-testing-integration/datasets/mobilegame_dataset_062708255728400/tables?alt=json>:
> response: <\{'status': '409', 'content-length': '366', 'x-xss-protection':
> '1; mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding':
> 'chunked', 'expires': 'Wed, 27 Jun 2018 15:32:35 GMT', 'vary': 'Origin,
> X-Origin', 'server': 'GSE', '-content-encoding': 'gzip', 'cache-control':
> 'private, max-age=0', 'date': 'Wed, 27 Jun 2018 15:32:35 GMT',
> 'x-frame-options': 'SAMEORIGIN', 'content-type': 'application/json;
> charset=UTF-8'}>, content <\{ "error": { "errors": [ { "domain": "global",
> "reason": "duplicate", "message": "Already Exists: Table
> google.com:deft-testing-integration:mobilegame_dataset_062708255728400.game_stats_teams"
> } ], "code": 409, "message": "Already Exists: Table
> google.com:deft-testing-integration:mobilegame_dataset_062708255728400.game_stats_teams"
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)