ahmedabu98 commented on PR #31837:
URL: https://github.com/apache/beam/pull/31837#issuecomment-2223926689

   Turns out managing AppendRows quota actually isn't the last blocker. I tried 
writing with a much bigger load 
([2024-07-11_11_13_57-12204862207935920837](https://pantheon.corp.google.com/dataflow/jobs/us-central1/2024-07-11_11_13_57-12204862207935920837))
 and the sink handled all the append operations well but it got stuck at the 
finalize and commit step:
   ```
   RESOURCE_EXHAUSTED: Exceeds quota limit subject: 
bigquerystorage.googleapis.com/write/pending_stream_bytes
   ```
   
   Pending stream bytes is a 
[quota](https://cloud.google.com/bigquery/quotas#write-api-limits) placed on 
PENDING stream types (which is what we use for batch). It's a maximum of 1TB 
for small regions.
   
   In our finalize and commit step, we finalize each stream one by one then 
perform a single commit on all of them: 
   
https://github.com/apache/beam/blob/50a3403a4742e1c9e264f57f4411969daeff4642/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiFinalizeWritesDoFn.java#L166
   
   For large writes, the aggregate byte size of all streams can easily get over 
1TB. Instead, we should probably break this up into multiple commit operations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to