ahmedabu98 opened a new issue, #23236: URL: https://github.com/apache/beam/issues/23236
### What would you like to happen? For large Java BQ batch loads that require copy jobs, the temp tables created for a given destination are [grouped up into a list of references](https://github.com/apache/beam/blob/f921a2f1996cf906d994a9d62aeb6978bab09dd5/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteRename.java#L126-L143) and are later used as a [source for a copy job](https://github.com/apache/beam/blob/f921a2f1996cf906d994a9d62aeb6978bab09dd5/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/WriteRename.java#L217). Ultimately, only one copy job is performed for a given table destination. Contrast this with the Python implementation: [one copy job is performed for each temp table](https://github.com/apache/beam/blob/9b83b79088dae5915f36dc75622e5a9126325bb5/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py#L567), even if they all go to the same destination. The `_insert_copy_job()` function used from `bigquery_tools` [only allows a single source](https://github.com/apache/beam/blob/9b83b79088dae5915f36dc75622e5a9126325bb5/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L468). But this should be possible because the `JobConfigurationTableCopy` from the internal client has a [`sourceTables`](https://github.com/apache/beam/blob/9b83b79088dae5915f36dc75622e5a9126325bb5/sdks/python/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_messages.py#L2798) field. I have not tried this, but I assume it can accept a list of table references as sources. Reducing multiple copy jobs down to one should improve the speed of large writes (less time wasted spinning up multiple jobs and waiting for them). This may also prevent partial writes in the event one copy job fails. ### Issue Priority Priority: 3 ### Issue Component Component: io-py-gcp -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
