[
https://issues.apache.org/jira/browse/BEAM-8222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930697#comment-16930697
]
Chamikara Jayalath commented on BEAM-8222:
------------------------------------------
Based on some offline comments from [~reuvenlax] this might be undesirable and
may cause user confusion.
AFAIK Dataflow and other Beam runners that support BigQueryIO.Sink are tolerant
to failures and may retry workitems. So handling duplicates is required for the
safety of inserted data. Without insertid things might speed up in the short
term for runs without failures but this mode of execution is not safe in the
long run.
> Consider making insertId optional in BigQuery.insertAll
> -------------------------------------------------------
>
> Key: BEAM-8222
> URL: https://issues.apache.org/jira/browse/BEAM-8222
> Project: Beam
> Issue Type: New Feature
> Components: io-java-gcp
> Reporter: Boyuan Zhang
> Priority: Major
>
> Current implementation of
> StreamingWriteFn(https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StreamingWriteFn.java#L102)
> sets insertId from input element, which is added an uniqueId by
> https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/TagWithUniqueIds.java#L53.
> Users report that if leaving insertId as empty, writing will be extremely
> speeded up. Can we add an bqOption like, nonInsertId and emit empty id based
> on this option?
--
This message was sent by Atlassian Jira
(v8.3.2#803003)