[
https://issues.apache.org/jira/browse/BEAM-13088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442914#comment-17442914
]
Pablo Estrada commented on BEAM-13088:
--------------------------------------
Large BQ loads (>10k files, >15TB) are done in two steps: First load to
temporary tables, and then copy temp tables into the final destination table.
The temp tables are in the same dataset as the final table, and this causes
difficulties for some users. The goal is to add a feature to support a
temporary dataset for these tables.
Here's where the copy jobs are issued:
[https://github.com/apache/beam/blob/735db247f3e03d9fddb9f6d7281c986b60ac683d/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteRename.java#L175-L215|https://www.google.com/url?q=https://github.com/apache/beam/blob/735db247f3e03d9fddb9f6d7281c986b60ac683d/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteRename.java%23L175-L215&sa=D&source=docs&ust=1636743136136000&usg=AOvVaw1goDuQ_L5DlGIYdVC2SJ_H]
The idea would be to add a new configuration parameter and plumb it from the
public interface in BigQueryIO.Write to the workflow that does this
two-step-load job.
> Load BigQuery temp tables into different dataset
> ------------------------------------------------
>
> Key: BEAM-13088
> URL: https://issues.apache.org/jira/browse/BEAM-13088
> Project: Beam
> Issue Type: Task
> Components: io-java-gcp
> Reporter: Kiley Sok
> Priority: P2
>
> When beam loads data into BigQuery, it sometimes creates temporary tables
> then bq copy into the destination table.
> The tables are created as temporary tables, which are then deleted
> afterwards. During which time, wildcard queries that run fail due to matching
> on these tables.
> Either create tables in a different dataset, or hidden altogether.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)