I think the same problem was recently fixed in python [1]. It'd be great to fix this in java though, we hit this a bunch, I've never had enough time to fix it.
[1] https://github.com/apache/beam/pull/14113 On Tue, Jul 27, 2021 at 4:19 PM Chamikara Jayalath <chamik...@google.com> wrote: > I don't have a lot of context regarding schema update options but this > does sound like a bug. Temp tables are only used for very large writes > (11TB or so last time I checked) so I wouldn't be surprised if not too many > users have run into this by using schema update options with very large > tables. > Can you create a Jira ? > > Thanks, > Cham > > On Tue, Jul 20, 2021 at 11:29 AM Siyuan Chen <syc...@google.com> wrote: > >> Hi Dev, >> >> I encountered a problem when trying to write data to BigQuery using FILE >> LOADS >> <https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1742>. >> With FILE LOADS, input data is first written to temp files >> <https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchLoads.java#L280> >> and then batch loaded >> <https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchLoads.java#L340> >> to BigQuery. When temp tables are needed (to avoid too many files in a >> single load job), the default write deposition is set to WRITE_TRUNCATE >> <https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteTables.java#L223> >> to allow retries. However, when the SchemaUpdateOptions >> <https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L2001> >> was also set, the load job failed with the following error: >> >> Schema update options should only be specified with WRITE_APPEND >> disposition, or with WRITE_TRUNCATE disposition on a table partition. >> >> I think it means that if WRITE_TRUNCATE is used, the partition of the >> table to truncate should also be supplied (which kinda makes sense as rows >> in a partition share the same schema). I failed to find code that would >> append partition decorator >> <https://cloud.google.com/bigquery/docs/managing-partitioned-table-data#using_a_load_job> >> to the temp tables. Does it sound like a missing piece in the BigQueryIO >> implementation? Please let me know if I missed anything important. >> >> Thanks in advance! >> >> -- >> Best regards, >> Siyuan >> >