Abacn commented on code in PR #35730: URL: https://github.com/apache/beam/pull/35730#discussion_r2286004726
########## sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java: ########## @@ -373,6 +373,7 @@ public static Match match() { public static MatchAll matchAll() { return new AutoValue_FileIO_MatchAll.Builder() .setConfiguration(MatchConfiguration.create(EmptyMatchTreatment.ALLOW_IF_WILDCARD)) + .setOutputParallelization(false) Review Comment: If OutputParallelization means adding a reshuffle, then we default to true here to keep default behavior the same ########## sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java: ########## @@ -723,8 +744,15 @@ public PCollection<MatchResult.Metadata> expand(PCollection<String> input) { res = input.apply(createWatchTransform(new ExtractFilenameFn())).apply(Values.create()); } } - return res.apply(Reshuffle.viaRandomKey()); + // Apply Reshuffle conditionally based on the new flag + if (getOutputParallelization()) { Review Comment: I think outputParallelization=true means adding a ReShuffle. ########## sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/text/TextIOIT.java: ########## @@ -136,7 +136,7 @@ public void writeThenReadAll() { PCollection<String> consolidatedHashcode = testFilenames - .apply("Match all files", FileIO.matchAll()) + .apply("Match all files", FileIO.matchAll().withOutputParallelization(true)) Review Comment: We don't need to change this after default set to true -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org