damccorm commented on code in PR #31721:
URL: https://github.com/apache/beam/pull/31721#discussion_r1663730726


##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java:
##########
@@ -109,6 +109,26 @@ public interface BigQueryOptions
 
   void setNumStorageWriteApiStreamAppendClients(Integer value);
 
+  @Description(
+      "When using the STORAGE_API_AT_LEAST_ONCE write method with multiplexing 
(ie. useStorageApiConnectionPool=true), "
+          + "this option sets the minimum number of connections each pool 
creates. This is on a per worker, per region basis. "
+          + "Note that in practice, the minimum number of connections created 
is the minimum of this value and "
+          + "(numStorageWriteApiStreamAppendClients x num destinations).")
+  @Default.Integer(2)
+  Integer getMinConnectionPoolConnections();
+
+  void setMinConnectionPoolConnections(Integer value);
+
+  @Description(
+      "When using the STORAGE_API_AT_LEAST_ONCE write method with multiplexing 
(ie. useStorageApiConnectionPool=true), "
+          + "this option sets the maximum number of connections each pool 
creates. This is on a per worker, per region basis. "
+          + "This value should be greater than or equal to the total number of 
dynamic destinations, otherwise a "
+          + "race condition occurs where append operations compete over 
streams.")
+  @Default.Integer(20)

Review Comment:
   A few questions here:
   
   1) Where does 20 come from? I'm noting that from 
https://cloud.google.com/bigquery/docs/write-api-best-practices#connection_pool_management
 20 connections seems more like a floor than a ceiling? This has the potential 
to degrade existing jobs which write to >20 destinations, so we should be 
careful here.
   2) "otherwise a race condition occurs where append operations compete over 
streams." - could we be more explicit about the implications of the race 
condition? Its degraded performance, not correctness, right?
   3) Can we detect when this race condition might occur and warn? I know we 
can't ahead of time, but do we get any warnings or anything like that back? 
Alternately, can we track the number of dynamic destinations we're writing to? 
(not sure if there's a clean path to doing this)



##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java:
##########
@@ -109,6 +109,26 @@ public interface BigQueryOptions
 
   void setNumStorageWriteApiStreamAppendClients(Integer value);
 
+  @Description(
+      "When using the STORAGE_API_AT_LEAST_ONCE write method with multiplexing 
(ie. useStorageApiConnectionPool=true), "
+          + "this option sets the minimum number of connections each pool 
creates. This is on a per worker, per region basis. "
+          + "Note that in practice, the minimum number of connections created 
is the minimum of this value and "
+          + "(numStorageWriteApiStreamAppendClients x num destinations).")
+  @Default.Integer(2)
+  Integer getMinConnectionPoolConnections();
+
+  void setMinConnectionPoolConnections(Integer value);

Review Comment:
   Is there value in exposing this to a customer? When would they want to tune 
this?



##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java:
##########
@@ -122,6 +142,9 @@ public interface BigQueryOptions
 
   void setStorageWriteMaxInflightBytes(Long value);
 
+  @Description(
+      "Enables multiplexing mode, where multiple tables can share the same 
connection. Only available when writing with STORAGE_API_AT_LEAST_ONCE"
+          + " mode. For more information, see 
https://cloud.google.com/bigquery/docs/write-api-best-practices#connection_pool_management";)

Review Comment:
   Could we add that
   1) this is recommended when writing to >20 tables - I think this is true 
based on the linked doc, if not maybe we can add other guidance on when to do 
this
   2) Can we add something like "when using multiplexing mode, consider 
limiting `MaxConnectionPoolConnections` as well"? 



##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java:
##########
@@ -109,6 +109,26 @@ public interface BigQueryOptions
 
   void setNumStorageWriteApiStreamAppendClients(Integer value);
 
+  @Description(
+      "When using the STORAGE_API_AT_LEAST_ONCE write method with multiplexing 
(ie. useStorageApiConnectionPool=true), "
+          + "this option sets the minimum number of connections each pool 
creates. This is on a per worker, per region basis. "
+          + "Note that in practice, the minimum number of connections created 
is the minimum of this value and "
+          + "(numStorageWriteApiStreamAppendClients x num destinations).")
+  @Default.Integer(2)
+  Integer getMinConnectionPoolConnections();
+
+  void setMinConnectionPoolConnections(Integer value);
+
+  @Description(
+      "When using the STORAGE_API_AT_LEAST_ONCE write method with multiplexing 
(ie. useStorageApiConnectionPool=true), "
+          + "this option sets the maximum number of connections each pool 
creates. This is on a per worker, per region basis. "
+          + "This value should be greater than or equal to the total number of 
dynamic destinations, otherwise a "
+          + "race condition occurs where append operations compete over 
streams.")
+  @Default.Integer(20)

Review Comment:
   > Where does 20 come from?
   
   More specifically - can we make this significantly higher without running 
into quota issues? How many connections are jobs that are running into problems 
using? IIRC it is orders of magnitude higher than this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to