[
https://issues.apache.org/jira/browse/BEAM-6611?focusedWorklogId=280112&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-280112
]
ASF GitHub Bot logged work on BEAM-6611:
----------------------------------------
Author: ASF GitHub Bot
Created on: 20/Jul/19 17:32
Start Date: 20/Jul/19 17:32
Worklog Time Spent: 10m
Work Description: ttanay commented on issue #8871: [BEAM-6611] BigQuery
file loads in Streaming for Python SDK
URL: https://github.com/apache/beam/pull/8871#issuecomment-513485596
I'm unable to run the IT tests for BQFL locally(even on clean master) and
also on a VM due to this error:
```
ERROR: test_multiple_destinations_transform
(apache_beam.io.gcp.bigquery_file_loads_test.BigQueryFileLoadsIT)
----------------------------------------------------------------------
Traceback (most recent call last):
File
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/io/gcp/bigquery_file_loads_test.py",
line 467, in test_multiple_destinations_transform
max_files_per_bundle=-1))
File
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/pipeline.py",
line 426, in __exit__
self.run().wait_until_finish()
File
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/pipeline.py",
line 406, in run
self._options).run(False)
File
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/pipeline.py",
line 419, in run
return self.runner.run_pipeline(self, self._options)
File
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/direct/test_direct_runner.py",
line 43, in run_pipeline
self.result = super(TestDirectRunner, self).run_pipeline(pipeline,
options)
File
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/direct/direct_runner.py",
line 128, in run_pipeline
return runner.run_pipeline(pipeline, options)
File
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
line 319, in run_pipeline
default_environment=self._default_environment))
File
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
line 326, in run_via_runner_api
return self.run_stages(stage_context, stages)
File
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
line 408, in run_stages
stage_context.safe_coders)
File
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
line 681, in _run_stage
result, splits = bundle_manager.process_bundle(data_input, data_output)
File
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
line 1562, in process_bundle
part_inputs):
File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 586, in
result_iterator
yield fs.pop().result()
File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 432, in
result
return self.__get_result()
File "/usr/local/lib/python3.7/concurrent/futures/_base.py", line 384, in
__get_result
raise self._exception
File "/usr/local/lib/python3.7/concurrent/futures/thread.py", line 57, in
run
result = self.fn(*self.args, **self.kwargs)
File
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
line 1561, in <lambda>
self._registered).process_bundle(part, expected_outputs),
File
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
line 1500, in process_bundle
result_future = self._controller.control_handler.push(process_bundle_req)
File
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
line 1017, in push
response = self.worker.do_instruction(request)
File
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py",
line 342, in do_instruction
request.instruction_id)
File
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py",
line 368, in process_bundle
bundle_processor.process_bundle(instruction_id))
File
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
line 593, in process_bundle
data.ptransform_id].process_encoded(data.data)
File
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
line 143, in process_encoded
self.output(decoded_value)
File
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/worker/operations.py",
line 256, in output
cython.cast(Receiver,
self.receivers[output_index]).receive(windowed_value)
File
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/worker/operations.py",
line 143, in receive
self.consumer.process(windowed_value)
File
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/worker/operations.py",
line 594, in process
delayed_application = self.dofn_receiver.receive(o)
File
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/common.py",
line 795, in receive
self.process(windowed_value)
File
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/common.py",
line 801, in process
self._reraise_augmented(exn)
File
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/common.py",
line 868, in _reraise_augmented
raise_with_traceback(new_exn)
File
"/home/tanay/Coding/beam-umbrella/beam-git/.env3/lib/python3.7/site-packages/future/utils/__init__.py",
line 421, in raise_with_traceback
raise exc.with_traceback(traceback)
File
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/common.py",
line 799, in process
return self.do_fn_invoker.invoke_process(windowed_value)
File
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/common.py",
line 611, in invoke_process
windowed_value, additional_args, additional_kwargs, output_processor)
File
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/common.py",
line 683, in _invoke_process_per_window
windowed_value, self.process_method(*args_for_process))
File
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/runners/common.py",
line 914, in process_outputs
for result in results:
File
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py",
line 413, in process
additional_load_parameters=additional_parameters)
File
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/io/gcp/bigquery_tools.py",
line 594, in perform_load_job
additional_load_parameters=additional_load_parameters)
File
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/utils/retry.py",
line 197, in wrapper
return fun(*args, **kwargs)
File
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/io/gcp/bigquery_tools.py",
line 350, in _insert_load_job
response = self.client.jobs.Insert(request)
File
"/home/tanay/Coding/beam-umbrella/beam-git/beam/sdks/python/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_client.py",
line 342, in Insert
upload=upload, upload_config=upload_config)
File
"/home/tanay/Coding/beam-umbrella/beam-git/.env3/lib/python3.7/site-packages/apitools/base/py/base_api.py",
line 731, in _RunMethod
return self.ProcessHttpResponse(method_config, http_response, request)
File
"/home/tanay/Coding/beam-umbrella/beam-git/.env3/lib/python3.7/site-packages/apitools/base/py/base_api.py",
line 737, in ProcessHttpResponse
self.__ProcessHttpResponse(method_config, http_response, request))
File
"/home/tanay/Coding/beam-umbrella/beam-git/.env3/lib/python3.7/site-packages/apitools/base/py/base_api.py",
line 604, in __ProcessHttpResponse
http_response, method_config=method_config, request=request)
RuntimeError: apitools.base.py.exceptions.HttpBadRequestError: HttpError
accessing <https://www.googleapis.com/bigquery/v2/projects//jobs?alt=json>:
response: <{'vary': 'Origin, X-Origin, Referer', 'content-type':
'application/json; charset=UTF-8', 'date': 'Sat, 20 Jul 2019 17:08:07 GMT',
'server': 'ESF', 'cache-control': 'private', 'x-xss-protection': '0',
'x-frame-options': 'SAMEORIGIN', 'x-content-type-options': 'nosniff',
'alt-svc': 'quic=":443"; ma=2592000; v="46,43,39"', 'transfer-encoding':
'chunked', 'status': '400', 'content-length': '632', '-content-encoding':
'gzip'}>, content <{
"error": {
"code": 400,
"message": "Invalid project ID ''. Project IDs must contain 6-63
lowercase letters, digits, or dashes. Some project IDs also include domain name
separated by a colon. IDs must start with a letter and may not end with a
dash.",
"errors": [
{
"message": "Invalid project ID ''. Project IDs must contain 6-63
lowercase letters, digits, or dashes. Some project IDs also include domain name
separated by a colon. IDs must start with a letter and may not end with a
dash.",
"domain": "global",
"reason": "invalid"
}
],
"status": "INVALID_ARGUMENT"
}
}
> [while running
'WriteWithMultipleDests/BigQueryBatchFileLoads/ParDo(TriggerLoadJobs)/ParDo(TriggerLoadJobs)']
-------------------- >> begin captured logging << --------------------
```
That's not the case with the tests being run in Jenkins.
Trying to resolve this so I can debug the failing test.
Hopefully, it's just a problem with my installation.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 280112)
Time Spent: 6h (was: 5h 50m)
> A Python Sink for BigQuery with File Loads in Streaming
> -------------------------------------------------------
>
> Key: BEAM-6611
> URL: https://issues.apache.org/jira/browse/BEAM-6611
> Project: Beam
> Issue Type: Improvement
> Components: sdk-py-core
> Reporter: Pablo Estrada
> Assignee: Tanay Tummalapalli
> Priority: Major
> Labels: gsoc, gsoc2019, mentor
> Time Spent: 6h
> Remaining Estimate: 0h
>
> The Java SDK supports a bunch of methods for writing data into BigQuery,
> while the Python SDK supports the following:
> - Streaming inserts for streaming pipelines [As seen in [bigquery.py and
> BigQueryWriteFn|https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L649-L813]]
> - File loads for batch pipelines [As implemented in [PR
> 7655|https://github.com/apache/beam/pull/7655]]
> Qucik and dirty early design doc: https://s.apache.org/beam-bqfl-py-streaming
> The Java SDK also supports File Loads for Streaming pipelines [see BatchLoads
> application|https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1709-L1776].
> File loads have the advantage of being much cheaper than streaming inserts
> (although they also are slower for the records to show up in the table).
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)