ahmedabu98 commented on PR #31837: URL: https://github.com/apache/beam/pull/31837#issuecomment-2223926689
Turns out managing AppendRows quota actually isn't the last blocker. I tried writing with a much bigger load ([2024-07-11_11_13_57-12204862207935920837](https://pantheon.corp.google.com/dataflow/jobs/us-central1/2024-07-11_11_13_57-12204862207935920837)) and the sink handled all the append operations well but it got stuck at the finalize and commit step: ``` RESOURCE_EXHAUSTED: Exceeds quota limit subject: bigquerystorage.googleapis.com/write/pending_stream_bytes ``` Pending stream bytes is a [quota](https://cloud.google.com/bigquery/quotas#write-api-limits) placed on PENDING stream types (which is what we use for batch). It's a maximum of 1TB for small regions. In our finalize and commit step, we finalize each stream one by one then perform a single commit on all of them: https://github.com/apache/beam/blob/50a3403a4742e1c9e264f57f4411969daeff4642/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/StorageApiFinalizeWritesDoFn.java#L166 For large writes, the aggregate byte size of all streams can easily get over 1TB. Instead, we should probably break this up into multiple commit operations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
