Polber commented on code in PR #30186:
URL: https://github.com/apache/beam/pull/30186#discussion_r1474719397


##########
sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/providers/BigQueryStorageWriteApiSchemaTransformProvider.java:
##########
@@ -383,13 +383,6 @@ public PCollectionRowTuple expand(PCollectionRowTuple 
input) {
         Boolean autoSharding = configuration.getAutoSharding();
         int numStreams = configuration.getNumStreams() == null ? 0 : 
configuration.getNumStreams();
 
-        // TODO(https://github.com/apache/beam/issues/30058): remove once 
Dataflow supports multiple
-        // DoFn's per fused step.
-        if (numStreams < 1) {
-          throw new IllegalStateException(
-              "numStreams must be set to a positive integer when input data is 
unbounded.");
-        }

Review Comment:
   This check was added because any Python SDK (or Beam Yaml) user who uses the 
ReadFromBigQuery with Storage API and unbounded data on Dataflow (at least) 
will not have data written if `numStreams` is not set to a positive integer. 
Digging through the logs will show there are warnings about crash loopback due 
to 2 stateful DoFn's not being able to be fused, but this is not a clear 
indicator to the issue. This check prevents the job from launching in the first 
place to avoid user frustration.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to