nehsyc commented on a change in pull request #14450:
URL: https://github.com/apache/beam/pull/14450#discussion_r609038210
##########
File path: sdks/java/core/src/main/java/org/apache/beam/sdk/io/WriteFiles.java
##########
@@ -349,15 +326,14 @@ public void validate(PipelineOptions options) {
getWindowedWrites(),
"Must use windowed writes when applying %s to an unbounded
PCollection",
WriteFiles.class.getSimpleName());
- // The reason for this is https://issues.apache.org/jira/browse/BEAM-1438
- // and similar behavior in other runners. Runners can choose to ignore
this check and perform
- // runner determined sharding for unbounded data by overriding the option
- // `withRunnerDeterminedShardingUnboundedInternal`.
- if (!getWithRunnerDeterminedShardingUnbounded()) {
+ // Sharding used to be required due to
https://issues.apache.org/jira/browse/BEAM-1438 and
+ // similar behavior in other runners. Some runners may support runner
determined sharding now.
+ // Check merging window here due to
https://issues.apache.org/jira/browse/BEAM-12040.
+ if (input.getWindowingStrategy().needsMerge()) {
Review comment:
There won't be a regression in the sense that runner determined sharding
was not supported for unbounded data before (so no performance comparison to
make). All existing pipelines should have shard explicitly set and will
continue to use the implementation for fixed sharding. New pipelines may
observe different behavior if they omit the shard parameter (which will not get
an error).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]