[jira] [Commented] (BEAM-13088) Load BigQuery temp tables into different dataset

Pablo Estrada (Jira) Fri, 12 Nov 2021 11:31:08 -0800


    [ 
https://issues.apache.org/jira/browse/BEAM-13088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442914#comment-17442914
 ]


Pablo Estrada commented on BEAM-13088:
--------------------------------------

Large BQ loads (>10k files, >15TB) are done in two steps: First load to 
temporary tables, and then copy temp tables into the final destination table.

The temp tables are in the same dataset as the final table, and this causes 
difficulties for some users. The goal is to add a feature to support a 
temporary dataset for these tables.

Here's where the copy jobs are issued: 
[https://github.com/apache/beam/blob/735db247f3e03d9fddb9f6d7281c986b60ac683d/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteRename.java#L175-L215|https://www.google.com/url?q=https://github.com/apache/beam/blob/735db247f3e03d9fddb9f6d7281c986b60ac683d/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteRename.java%23L175-L215&sa=D&source=docs&ust=1636743136136000&usg=AOvVaw1goDuQ_L5DlGIYdVC2SJ_H]

The idea would be to add a new configuration parameter and plumb it from the 
public interface in BigQueryIO.Write to the workflow that does this 
two-step-load job.

> Load BigQuery temp tables into different dataset
> ------------------------------------------------
>
>                 Key: BEAM-13088
>                 URL: https://issues.apache.org/jira/browse/BEAM-13088
>             Project: Beam
>          Issue Type: Task
>          Components: io-java-gcp
>            Reporter: Kiley Sok
>            Priority: P2
>
> When beam loads data into BigQuery, it sometimes creates temporary tables 
> then bq copy into the destination table.
> The tables are created as temporary tables, which are then deleted 
> afterwards. During which time, wildcard queries that run fail due to matching 
> on these tables.
> Either create tables in a different dataset, or hidden altogether.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (BEAM-13088) Load BigQuery temp tables into different dataset

Reply via email to