Re: BigQueryIO SchemaUpdateOptions incompatible with temp tables?

Steve Niemitz Tue, 27 Jul 2021 13:39:29 -0700

I think the same problem was recently fixed in python [1].  It'd be great
to fix this in java though, we hit this a bunch, I've never had enough time
to fix it.


[1] https://github.com/apache/beam/pull/14113

On Tue, Jul 27, 2021 at 4:19 PM Chamikara Jayalath <chamik...@google.com>
wrote:

> I don't have a lot of context regarding schema update options but this
> does sound like a bug. Temp tables are only used for very large writes
> (11TB or so last time I checked) so I wouldn't be surprised if not too many
> users have run into this by using schema update options with very large
> tables.
> Can you create a Jira ?
>
> Thanks,
> Cham
>
> On Tue, Jul 20, 2021 at 11:29 AM Siyuan Chen <syc...@google.com> wrote:
>
>> Hi Dev,
>>
>> I encountered a problem when trying to write data to BigQuery using FILE
>> LOADS
>> <https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1742>.
>> With FILE LOADS, input data is first written to temp files
>> <https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchLoads.java#L280>
>> and then batch loaded
>> <https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BatchLoads.java#L340>
>> to BigQuery. When temp tables are needed (to avoid too many files in a
>> single load job), the default write deposition is set to WRITE_TRUNCATE
>> <https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteTables.java#L223>
>> to allow retries. However, when the SchemaUpdateOptions
>> <https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L2001>
>> was also set, the load job failed with the following error:
>>
>> Schema update options should only be specified with WRITE_APPEND
>> disposition, or with WRITE_TRUNCATE disposition on a table partition.
>>
>> I think it means that if WRITE_TRUNCATE is used, the partition of the
>> table to truncate should also be supplied (which kinda makes sense as rows
>> in a partition share the same schema). I failed to find code that would
>> append partition decorator
>> <https://cloud.google.com/bigquery/docs/managing-partitioned-table-data#using_a_load_job>
>> to the temp tables. Does it sound like a missing piece in the BigQueryIO
>> implementation? Please let me know if I missed anything important.
>>
>> Thanks in advance!
>>
>> --
>> Best regards,
>> Siyuan
>>
>

Re: BigQueryIO SchemaUpdateOptions incompatible with temp tables?

Reply via email to