reuvenlax commented on code in PR #33231:
URL: https://github.com/apache/beam/pull/33231#discussion_r1889416192
##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiWritesShardedRecords.java:
##########
@@ -531,6 +532,30 @@ public void process(
element.getKey().getKey(), dynamicDestinations,
datasetService);
tableSchema = converter.getTableSchema();
descriptor = converter.getDescriptor(false);
+
+ if (autoUpdateSchema) {
Review Comment:
I'm not sure this is the ideal place to put this. getAppendClientInfo is
called whenever the static cache is populated, meaning that on any worker
restart, range move, etc. we'll be forced to call this API again. However we
have persistent state in this DoFn, so we know if it's a "new" key or not. Can
we use that to gate calling this method instead?
##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java:
##########
@@ -1419,7 +1421,11 @@ public WriteStream createWriteStream(String tableUrn,
WriteStream.Type type)
@Override
public @Nullable WriteStream getWriteStream(String writeStream) {
- return newWriteClient.getWriteStream(writeStream);
+ return newWriteClient.getWriteStream(
Review Comment:
Let's add a boolean parameter to the method, so we only return schema if
requested
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]