[
https://issues.apache.org/jira/browse/BEAM-5426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16619830#comment-16619830
]
Reuven Lax commented on BEAM-5426:
----------------------------------
Two issues:
# I'm not sure how to do this easily as the destinations are sharded across
all the workers.
# We don't have a way of failing jobs from in the SDK. The best we can do is
throw an exception, but that doesn't necessarily fail the job (for Dataflow
streaming, that will simply result in a infinite exception loop and a stuck
job).
> Use both destination and TableDestination for BQ load job IDs
> -------------------------------------------------------------
>
> Key: BEAM-5426
> URL: https://issues.apache.org/jira/browse/BEAM-5426
> Project: Beam
> Issue Type: Improvement
> Components: io-java-gcp
> Reporter: Chamikara Jayalath
> Priority: Major
>
> Currently we use TableDestination when creating a unique load job ID for a
> destination:
> [https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryHelpers.java#L359]
>
> This can result in a data loss issue if a user returns the same
> TableDestination for different destination IDs. I think we can prevent this
> if we include both IDs in the BQ load job ID.
>
> CC: [~reuvenlax]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)