reuvenlax commented on code in PR #33231:
URL: https://github.com/apache/beam/pull/33231#discussion_r1889416192


##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiWritesShardedRecords.java:
##########
@@ -531,6 +532,30 @@ public void process(
                       element.getKey().getKey(), dynamicDestinations, 
datasetService);
               tableSchema = converter.getTableSchema();
               descriptor = converter.getDescriptor(false);
+
+              if (autoUpdateSchema) {

Review Comment:
   I'm not sure this is the ideal place to put this. getAppendClientInfo is 
called whenever the static cache is populated, meaning that on any worker 
restart, range move, etc. we'll be forced to call this API again. However we 
have persistent state in this DoFn, so we know if it's a "new" key or not. Can 
we use that to gate calling this method instead?



##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryServicesImpl.java:
##########
@@ -1419,7 +1421,11 @@ public WriteStream createWriteStream(String tableUrn, 
WriteStream.Type type)
 
     @Override
     public @Nullable WriteStream getWriteStream(String writeStream) {
-      return newWriteClient.getWriteStream(writeStream);
+      return newWriteClient.getWriteStream(

Review Comment:
   Let's add a boolean parameter to the method, so we only return schema if 
requested



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to