kennknowles commented on a change in pull request #15998:
URL: https://github.com/apache/beam/pull/15998#discussion_r751459853
##########
File path:
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java
##########
@@ -2427,6 +2435,22 @@ static String getExtractDestinationUri(String
extractDestinationDir) {
return toBuilder().setAutoSharding(true).build();
}
+ /**
+ * Provides a function which can serve as a source of deterministic unique
ids for each record
+ * to be written, replacing the unique ids generated with the default
scheme. When used with
+ * {@link Method#STREAMING_INSERTS} This also elides the re-shuffle from
the BigQueryIO Write by
+ * using the keys on which the data is grouped at the point at which
BigQueryIO Write is
+ * applied, since the reshuffle is necessary only for the checkpointing of
the default-generated
Review comment:
We can fix later, but a reshuffle does not imply checkpointing except on
Dataflow (and Dataflow could also change this if it wants)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]