ahmedabu98 opened a new issue, #31872:
URL: https://github.com/apache/beam/issues/31872

   ### What needs to happen?
   
   In the BigQuery Storage API batch connector, we use Pending streams to write 
to BigQuery. The final step in the connector is to commit the stream contents 
into the table.
   
   Currently we do one single batch commit for all streams. There is a 
[quota](https://cloud.google.com/bigquery/quotas#write-api-limits) placed on 
the number of bytes we can commit per operation: 1TB for small regions, 10TB 
for multi-regions. Essentially any batch write job's size will be restricted to 
this limit. Would it be a good idea to break this up into multiple back-to-back 
commits?
   
   @Abacn brings up a good point in this 
[comment](https://github.com/apache/beam/pull/31837#issuecomment-2224136803) 
about whether this is done intentionally to avoid partially written data in the 
rare case where the whole pipeline fails between commits (and is unable to 
retry).
   
   However, limiting it to one commit would place a hard restriction on the 
amount of data one can write with this connector.
   
   
   ### Issue Priority
   
   Priority: 2 (default / most normal work should be filed as P2)
   
   ### Issue Components
   
   - [ ] Component: Python SDK
   - [X] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [X] Component: IO connector
   - [ ] Component: Beam YAML
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [ ] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to