agrawal-siddharth commented on code in PR #31721:
URL: https://github.com/apache/beam/pull/31721#discussion_r1672915640


##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java:
##########
@@ -109,6 +109,29 @@ public interface BigQueryOptions
 
   void setNumStorageWriteApiStreamAppendClients(Integer value);
 
+  @Description(
+      "When using the STORAGE_API_AT_LEAST_ONCE write method with multiplexing 
(ie. useStorageApiConnectionPool=true), "
+          + "this option sets the minimum number of connections each pool 
creates. This is on a per worker, per region basis. "

Review Comment:
   "this option sets the minimum number of connections each pool creates before 
any connections are shared."
   
   When you say "per worker" I'm not sure what a worker corresponds to within 
the writeAPI storage library. Does each worker create its own StreamWriter? 
Under the hood, the connection pool is a static map whose KEY is the region, 
and the VALUE is the pool for that region. So all StreamWriter objects within 
the same process will share the same map. This could span many workers if they 
run within the same process.



##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java:
##########
@@ -109,6 +109,26 @@ public interface BigQueryOptions
 
   void setNumStorageWriteApiStreamAppendClients(Integer value);
 
+  @Description(
+      "When using the STORAGE_API_AT_LEAST_ONCE write method with multiplexing 
(ie. useStorageApiConnectionPool=true), "
+          + "this option sets the minimum number of connections each pool 
creates. This is on a per worker, per region basis. "
+          + "Note that in practice, the minimum number of connections created 
is the minimum of this value and "
+          + "(numStorageWriteApiStreamAppendClients x num destinations).")
+  @Default.Integer(2)
+  Integer getMinConnectionPoolConnections();
+
+  void setMinConnectionPoolConnections(Integer value);
+
+  @Description(
+      "When using the STORAGE_API_AT_LEAST_ONCE write method with multiplexing 
(ie. useStorageApiConnectionPool=true), "
+          + "this option sets the maximum number of connections each pool 
creates. This is on a per worker, per region basis. "
+          + "This value should be greater than or equal to the total number of 
dynamic destinations, otherwise a "
+          + "race condition occurs where append operations compete over 
streams.")
+  @Default.Integer(20)

Review Comment:
   We have 2 general concerns with too many connections:
   
   (a) the user will hit the concurrent connection limit, and
   
   (b) it is not fruitful to further increase the number of connections because 
a single machine can only send so much traffic over the network. From 
experiments going beyond about 20 connections does not help with increasing 
throughput.
   
   Due to the above, we don't expect it will help much to raise this limit. So 
I think a customer is less likely to want to tune this parameter but we have it 
for completeness.



##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryOptions.java:
##########
@@ -109,6 +109,29 @@ public interface BigQueryOptions
 
   void setNumStorageWriteApiStreamAppendClients(Integer value);
 
+  @Description(
+      "When using the STORAGE_API_AT_LEAST_ONCE write method with multiplexing 
(ie. useStorageApiConnectionPool=true), "
+          + "this option sets the minimum number of connections each pool 
creates. This is on a per worker, per region basis. "
+          + "Note that in practice, the minimum number of connections created 
is the minimum of this value and "
+          + "(numStorageWriteApiStreamAppendClients x num destinations). 
BigQuery will create this many connections at first "

Review Comment:
   This is the minimum number of connections per region. Note that BigQuery 
creates the connections on demand. Up to the minimum, there is no connection 
sharing. Once the minimum has been reached, existing connections will be shared 
until they are saturated. Once all existing connections are saturated then new 
connections, up to the maximum, will be created.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to