Pablo Estrada created BEAM-7742:
-----------------------------------
Summary: BigQuery File Loads to work well with load job size limits
Key: BEAM-7742
URL: https://issues.apache.org/jira/browse/BEAM-7742
Project: Beam
Issue Type: Improvement
Components: io-python-gcp
Reporter: Pablo Estrada
Assignee: Tanay Tummalapalli
Load jobs into BigQuery have a number of limitations:
[https://cloud.google.com/bigquery/quotas#load_jobs]
Currently, the python BQ sink implemented in `bigquery_file_loads.py` does not
handle these limitations well. Improvements need to be made to the
miplementation, to:
* Decide to use temp_tables dynamically at pipeline execution
* Add code to determine when a load job to a single destination needs to be
partitioned into multiple jobs.
* When this happens, then we definitely need to use temp_tables, in case one
of the two load jobs fails, and the pipeline is rerun.
Tanay, would you be able to look at this?
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)