Pablo Estrada created BEAM-7742:
-----------------------------------

             Summary: BigQuery File Loads to work well with load job size limits
                 Key: BEAM-7742
                 URL: https://issues.apache.org/jira/browse/BEAM-7742
             Project: Beam
          Issue Type: Improvement
          Components: io-python-gcp
            Reporter: Pablo Estrada
            Assignee: Tanay Tummalapalli


Load jobs into BigQuery have a number of limitations: 
[https://cloud.google.com/bigquery/quotas#load_jobs]

 

Currently, the python BQ sink implemented in `bigquery_file_loads.py` does not 
handle these limitations well. Improvements need to be made to the 
miplementation, to:
 * Decide to use temp_tables dynamically at pipeline execution
 * Add code to determine when a load job to a single destination needs to be 
partitioned into multiple jobs.
 * When this happens, then we definitely need to use temp_tables, in case one 
of the two load jobs fails, and the pipeline is rerun.

Tanay, would you be able to look at this?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to