[ 
https://issues.apache.org/jira/browse/BEAM-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525438#comment-16525438
 ] 

Ahmet Altay commented on BEAM-4662:
-----------------------------------

cc: [~mariagh]

> WriteToBigQuery fails when it tries to create an already existing table
> -----------------------------------------------------------------------
>
>                 Key: BEAM-4662
>                 URL: https://issues.apache.org/jira/browse/BEAM-4662
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py-core
>            Reporter: Ahmet Altay
>            Assignee: Chamikara Jayalath
>            Priority: Major
>
> This call:
> [https://github.com/apache/beam/blob/375bd3a6a53ba3ba7c965278dcb322875e1b4dca/sdks/python/apache_beam/io/gcp/bigquery.py#L1259]
> to get_or_create_table will check if the the table exists and will try to 
> create if it does not. However after the check the table might be created by 
> another worker doing the exact same thing. The follow up create should not 
> fail in these cases.
> Results in the following error log although it retries the bundle and does 
> not cause a permanent fail.
>  
> Caused by: java.lang.RuntimeException: Error received from SDK harness for 
> instruction -374: Traceback (most recent call last): File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 127, in _execute response = task() File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 162, in <lambda> self._execute(lambda: worker.do_instruction(work), 
> work) File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 208, in do_instruction request.instruction_id) File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/sdk_worker.py",
>  line 230, in process_bundle processor.process_bundle(instruction_id) File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/runners/worker/bundle_processor.py",
>  line 289, in process_bundle op.start() File 
> "apache_beam/runners/worker/operations.py", line 351, in 
> apache_beam.runners.worker.operations.DoOperation.start def start(self): File 
> "apache_beam/runners/worker/operations.py", line 352, in 
> apache_beam.runners.worker.operations.DoOperation.start with 
> self.scoped_start_state: File "apache_beam/runners/worker/operations.py", 
> line 396, in apache_beam.runners.worker.operations.DoOperation.start 
> self.dofn_runner.start() File "apache_beam/runners/common.py", line 595, in 
> apache_beam.runners.common.DoFnRunner.start 
> self._invoke_bundle_method(self.do_fn_invoker.invoke_start_bundle) File 
> "apache_beam/runners/common.py", line 589, in 
> apache_beam.runners.common.DoFnRunner._invoke_bundle_method 
> self._reraise_augmented(exn) File "apache_beam/runners/common.py", line 602, 
> in apache_beam.runners.common.DoFnRunner._reraise_augmented raise File 
> "apache_beam/runners/common.py", line 587, in 
> apache_beam.runners.common.DoFnRunner._invoke_bundle_method bundle_method() 
> File "apache_beam/runners/common.py", line 293, in 
> apache_beam.runners.common.DoFnInvoker.invoke_start_bundle def 
> invoke_start_bundle(self): File "apache_beam/runners/common.py", line 297, in 
> apache_beam.runners.common.DoFnInvoker.invoke_start_bundle 
> self.signature.start_bundle_method.method_value()) File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py", line 
> 1261, in start_bundle self.create_disposition, self.write_disposition) File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/utils/retry.py", line 
> 184, in wrapper return fun(*args, **kwargs) File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py", line 
> 1040, in get_or_create_table schema=schema or found_table.schema) File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/bigquery.py", line 
> 851, in _create_table response = self.client.tables.Insert(request) File 
> "/usr/local/lib/python2.7/dist-packages/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_client.py",
>  line 621, in Insert config, request, global_params=global_params) File 
> "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 
> 722, in _RunMethod return self.ProcessHttpResponse(method_config, 
> http_response, request) File 
> "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 
> 728, in ProcessHttpResponse self.__ProcessHttpResponse(method_config, 
> http_response, request)) File 
> "/usr/local/lib/python2.7/dist-packages/apitools/base/py/base_api.py", line 
> 599, in __ProcessHttpResponse http_response, method_config=method_config, 
> request=request) HttpConflictError: HttpError accessing 
> <https://www.googleapis.com/bigquery/v2/projects/google.com%3Adeft-testing-integration/datasets/mobilegame_dataset_062708255728400/tables?alt=json>:
>  response: <\{'status': '409', 'content-length': '366', 'x-xss-protection': 
> '1; mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding': 
> 'chunked', 'expires': 'Wed, 27 Jun 2018 15:32:35 GMT', 'vary': 'Origin, 
> X-Origin', 'server': 'GSE', '-content-encoding': 'gzip', 'cache-control': 
> 'private, max-age=0', 'date': 'Wed, 27 Jun 2018 15:32:35 GMT', 
> 'x-frame-options': 'SAMEORIGIN', 'content-type': 'application/json; 
> charset=UTF-8'}>, content <\{ "error": { "errors": [ { "domain": "global", 
> "reason": "duplicate", "message": "Already Exists: Table 
> google.com:deft-testing-integration:mobilegame_dataset_062708255728400.game_stats_teams"
>  } ], "code": 409, "message": "Already Exists: Table 
> google.com:deft-testing-integration:mobilegame_dataset_062708255728400.game_stats_teams"
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to