[ 
https://issues.apache.org/jira/browse/BEAM-5426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619823#comment-16619823
 ] 

Chamikara Jayalath commented on BEAM-5426:
------------------------------------------

In that case, how about keeping track of load jobs for different destinations, 
and failing the job if we detect two load jobs for the same destination ? We 
should find a way to actively fail for this case, since currently this ends up 
being a silent data loss.

> Use both destination and TableDestination for BQ load job IDs
> -------------------------------------------------------------
>
>                 Key: BEAM-5426
>                 URL: https://issues.apache.org/jira/browse/BEAM-5426
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-gcp
>            Reporter: Chamikara Jayalath
>            Priority: Major
>
> Currently we use TableDestination when creating a unique load job ID for a 
> destination: 
> [https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryHelpers.java#L359]
>  
> This can result in a data loss issue if a user returns the same 
> TableDestination for different destination IDs. I think we can prevent this 
> if we include both IDs in the BQ load job ID.
>  
> CC: [~reuvenlax]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to