Akshatsharma2205 commented on code in PR #38776:
URL: https://github.com/apache/beam/pull/38776#discussion_r3350246411


##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/providers/BigQueryWriteConfiguration.java:
##########
@@ -101,6 +101,12 @@ public void validate() {
 
     Boolean autoSharding = getAutoSharding();
     Integer numStreams = getNumStreams();
+    if (numStreams != null) {
+      checkArgument(
+          numStreams >= 0,
+          invalidConfigMessage + "numStreams must be non-negative, but was: 
%s",
+          numStreams);
+    }

Review Comment:
   Thanks for the suggestion i looked into the existing BigQueryIO semantics 
and i think this validation would be broader than the scope of this PR.
   
   `numStorageWriteApiStreams` currently uses `0` as the default/unset sentinel 
in BigQueryIO, and `withNumStorageWriteApiStreams(0)` is documented as valid 
behavior: for streaming it lets the runner determine sharding, and for batch it 
avoids inserting a redistribute and keeps the existing pipeline parallelism.
   
   Because `autoSharding` is only applicable to unbounded input, requiring 
`numStreams == 0` to also have `autoSharding == true` would reject or 
complicate valid/default batch behavior and would be a behavior change beyond 
this PR’s narrow goal of allowing positive fixed stream counts for bounded 
Storage Write API writes.
   
   Given that Id prefer not to add this validation unless a reviewer wants to 
intentionally change the accepted schema transform semantics for explicit 
`numStreams: 0`.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to