[
https://issues.apache.org/jira/browse/BEAM-5426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619792#comment-16619792
]
Reuven Lax commented on BEAM-5426:
----------------------------------
If different destinations return the same TableDestination, worse things can
happen. In that case parallel loads to the same table might happen from
different workers (since we distribute based on the destination), which can
cause data corruption (e.g. if the disposition is set to WRITE_TRUNCATE).
> Use both destination and TableDestination for BQ load job IDs
> -------------------------------------------------------------
>
> Key: BEAM-5426
> URL: https://issues.apache.org/jira/browse/BEAM-5426
> Project: Beam
> Issue Type: Improvement
> Components: io-java-gcp
> Reporter: Chamikara Jayalath
> Priority: Major
>
> Currently we use TableDestination when creating a unique load job ID for a
> destination:
> [https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryHelpers.java#L359]
>
> This can result in a data loss issue if a user returns the same
> TableDestination for different destination IDs. I think we can prevent this
> if we include both IDs in the BQ load job ID.
>
> CC: [~reuvenlax]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)