[
https://issues.apache.org/jira/browse/BEAM-11277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17284256#comment-17284256
]
Beam JIRA Bot commented on BEAM-11277:
--------------------------------------
This issue is P2 but has been unassigned without any comment for 60 days so it
has been labeled "stale-P2". If this issue is still affecting you, we care!
Please comment and remove the label. Otherwise, in 14 days the issue will be
moved to P3.
Please see https://beam.apache.org/contribute/jira-priorities/ for a detailed
explanation of what these priorities mean.
> WriteToBigQuery with batch file loads does not respect schema update options
> when there are multiple load jobs
> --------------------------------------------------------------------------------------------------------------
>
> Key: BEAM-11277
> URL: https://issues.apache.org/jira/browse/BEAM-11277
> Project: Beam
> Issue Type: Bug
> Components: io-py-gcp, runner-dataflow
> Affects Versions: 2.21.0, 2.25.0
> Reporter: Chun Yang
> Priority: P2
> Labels: stale-P2
> Attachments: repro.py
>
>
> When multiple load jobs are needed to write data to a destination table,
> e.g., when the data is spread over more than
> [10,000|https://cloud.google.com/bigquery/quotas#load_jobs] URIs,
> WriteToBigQuery in FILE_LOADS mode will write data into temporary tables and
> then copy the temporary tables into the destination table.
> When WriteToBigQuery is used with
> {{write_disposition=BigQueryDisposition.WRITE_APPEND}} and
> {{additional_bq_parameters=\{"schemaUpdateOptions":
> ["ALLOW_FIELD_ADDITION"]\}}}, the schema update options are not respected by
> the jobs that copy data from temporary tables into the destination table. The
> effect is that for small jobs (<10K source URIs), schema field addition is
> allowed, however, if the job is scaled to >10K source URIs, then schema field
> addition will fail with an error such as:
> {code:none}Provided Schema does not match Table project:dataset.table. Cannot
> add fields (field: field_name){code}
> I've been able to reproduce this issue with Python 3.7 and DataflowRunner on
> Beam 2.21.0 and Beam 2.25.0. I could not reproduce the issue with
> DirectRunner. A minimal reproducible example is attached.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)