[ 
https://issues.apache.org/jira/browse/BEAM-5426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619792#comment-16619792
 ] 

Reuven Lax commented on BEAM-5426:
----------------------------------

If different destinations return the same TableDestination, worse things can 
happen. In that case parallel loads to the same table might happen from 
different workers (since we distribute based on the destination), which can 
cause data corruption (e.g. if the disposition is set to WRITE_TRUNCATE).

> Use both destination and TableDestination for BQ load job IDs
> -------------------------------------------------------------
>
>                 Key: BEAM-5426
>                 URL: https://issues.apache.org/jira/browse/BEAM-5426
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-gcp
>            Reporter: Chamikara Jayalath
>            Priority: Major
>
> Currently we use TableDestination when creating a unique load job ID for a 
> destination: 
> [https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryHelpers.java#L359]
>  
> This can result in a data loss issue if a user returns the same 
> TableDestination for different destination IDs. I think we can prevent this 
> if we include both IDs in the BQ load job ID.
>  
> CC: [~reuvenlax]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to