Akshatsharma2205 commented on code in PR #38776:
URL: https://github.com/apache/beam/pull/38776#discussion_r3350246411
##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/providers/BigQueryWriteConfiguration.java:
##########
@@ -101,6 +101,12 @@ public void validate() {
Boolean autoSharding = getAutoSharding();
Integer numStreams = getNumStreams();
+ if (numStreams != null) {
+ checkArgument(
+ numStreams >= 0,
+ invalidConfigMessage + "numStreams must be non-negative, but was:
%s",
+ numStreams);
+ }
Review Comment:
Thanks for the suggestion i looked into the existing BigQueryIO semantics
and i think this validation would be broader than the scope of this PR.
`numStorageWriteApiStreams` currently uses `0` as the default/unset sentinel
in BigQueryIO, and `withNumStorageWriteApiStreams(0)` is documented as valid
behavior: for streaming it lets the runner determine sharding, and for batch it
avoids inserting a redistribute and keeps the existing pipeline parallelism.
Because `autoSharding` is only applicable to unbounded input, requiring
`numStreams == 0` to also have `autoSharding == true` would reject or
complicate valid/default batch behavior and would be a behavior change beyond
this PR’s narrow goal of allowing positive fixed stream counts for bounded
Storage Write API writes.
Given that Id prefer not to add this validation unless a reviewer wants to
intentionally change the accepted schema transform semantics for explicit
`numStreams: 0`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]