This is an automated email from the ASF dual-hosted git repository.

xqhu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/master by this push:
     new 9057cc29f3f Document BQ Storage API pipeline options (#35259)
9057cc29f3f is described below

commit 9057cc29f3f3ac1464d0351628b8fa26339cca16
Author: Veronica Wasson <[email protected]>
AuthorDate: Sun Jun 15 17:05:26 2025 -0700

    Document BQ Storage API pipeline options (#35259)
    
    * Document BQ Storage API pipeline options
    
    * Fix whitespace
    
    * Fix whitespace
---
 .../documentation/io/built-in/google-bigquery.md   | 109 +++++++++++++++++++++
 1 file changed, 109 insertions(+)

diff --git 
a/website/www/site/content/en/documentation/io/built-in/google-bigquery.md 
b/website/www/site/content/en/documentation/io/built-in/google-bigquery.md
index d49e9bac949..f53fc5eb72f 100644
--- a/website/www/site/content/en/documentation/io/built-in/google-bigquery.md
+++ b/website/www/site/content/en/documentation/io/built-in/google-bigquery.md
@@ -904,6 +904,115 @@ When using `STORAGE_API_AT_LEAST_ONCE`, the `PCollection` 
returned by
 
[`WriteResult.getFailedStorageApiInserts`](https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/bigquery/WriteResult.html#getFailedStorageApiInserts--)
 contains the rows that failed to be written to the Storage Write API sink.
 
+#### Tune the Storage Write API
+
+By default, the BigQueryIO Write transform uses Storage Write API settings that
+are reasonable for most pipelines.
+
+If you see performance issues, such as stuck pipelines, quota limit errors, or
+monotonically increasing backlog, consider tuning the following pipeline
+options when you run the job:
+
+<div class="table-container-wrapper">
+<table class="table table-bordered">
+  <tr>
+    <th>Option (Java/Python)</th>
+    <th>Description</th>
+  </tr>
+  <tr>
+    <td>
+      <p><code>maxConnectionPoolConnections</code></p>
+      <p><code>max_connection_pool_connections</code></p>
+    </td>
+    <td>
+      If the write mode is <code>STORAGE_API_AT_LEAST_ONCE</code> and the
+      <code>useStorageApiConnectionPool</code> option is <code>true</code>, 
this
+      option sets the maximum number of connections that each pool creates, per
+      worker and region. If your pipeline writes many dynamic destinations 
(more
+      than 20), and you see performance issues or append operations are
+      competing for streams, then consider increasing this value.
+    </td>
+  </tr>
+  <tr>
+    <td>
+      <p><code>minConnectionPoolConnections</code></p>
+      <p><code>min_connection_pool_connections</code></p>
+    </td>
+    <td>
+      <p>If the write mode is <code>STORAGE_API_AT_LEAST_ONCE</code> and the
+      <code>useStorageApiConnectionPool</code> option is <code>true</code>, 
this
+      option sets the minimum number of connections that each pool creates
+      before any connections are shared, per worker and region.</p>
+      <p>In practice, the minimum number of connections created is the minimum
+      of this option and <code>numStorageWriteApiStreamAppendClients</code> x
+      <em>destination count</em>. BigQuery initially creates that many
+      connections at first, and only creates more connections if the current
+      ones are overwhelmed. If you have performance issues, then consider
+      increasing this value.</td>
+  </tr>
+  <tr>
+    <td>
+      <p><code>numStorageWriteApiStreamAppendClients</code></p>
+      <p><code>num_storage_write_api_stream_append_clients</code></p>
+    </td>
+    <td>
+      If the write mode is <code>STORAGE_API_AT_LEAST_ONCE</code>, this option
+      sets the number of stream append clients allocated per worker and
+      destination. For high-volume pipelines with a large number of workers,
+      a high value can cause the job to exceed the BigQuery connection quota.
+      For most low- to mid-volume pipelines, the default value is sufficient.
+    </td>
+  </tr>
+  <tr>
+    <td>
+      <p><code>storageApiAppendThresholdBytes</code></p>
+      <p><code>storage_api_append_threshold_bytes</code></p>
+    </td>
+    <td>
+      Maximum size of a single append to the Storage Write API (best effort).
+    </td>
+  </tr>
+  <tr>
+    <td>
+      <p><code>storageApiAppendThresholdRecordCount</code></p>
+      <p><code>storage_api_append_threshold_record_count</code></p>
+    </td>
+    <td>
+      Maximum record count of a single append to the Storage Write API (best
+      effort).
+    </td>
+  </tr>
+  <tr>
+    <td>
+      <p><code>storageWriteMaxInflightRequests</code></p>
+      <p><code>storage_write_max_inflight_requests</code></p>
+    </td>
+    <td>Expected maximum number of inflight messages per connection.</td>
+  </tr>
+  <tr>
+    <td>
+      <p><code>useStorageApiConnectionPool</code></p>
+      <p><code>use_storage_api_connection_pool</code></p>
+    </td>
+    <td>
+      <p>If <code>true</code>, enables multiplexing mode, where multiple tables
+      can share the same connection. This mode is only available when the write
+      mode is <code>STORAGE_API_AT_LEAST_ONCE</code>. Consider enabling
+      multiplexing if your write operation creates 20 or more connections.</p>
+      <p>If you enable multiplexing, consider setting the following options to
+      tune the number of connections created by the connection pool:</p>
+      <ul>
+       <li><code>minConnectionPoolConnections</code></li>
+       <li><code>maxConnectionPoolConnections</code></li>
+      </ul>
+      <p>For more information, see <a
+      
href="https://cloud.google.com/bigquery/docs/write-api-best-practices#connection_pool_management";>
+      Connection pool management</a> in the BigQuery documentation.</p>
+    </td>
+  </tr>
+</table>
+</div>
+
 #### Quotas
 
 Before using the Storage Write API, be aware of the

Reply via email to