Abacn commented on code in PR #35730:
URL: https://github.com/apache/beam/pull/35730#discussion_r2286004726


##########
sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java:
##########
@@ -373,6 +373,7 @@ public static Match match() {
   public static MatchAll matchAll() {
     return new AutoValue_FileIO_MatchAll.Builder()
         
.setConfiguration(MatchConfiguration.create(EmptyMatchTreatment.ALLOW_IF_WILDCARD))
+        .setOutputParallelization(false)

Review Comment:
   If OutputParallelization means adding a reshuffle, then we default to true 
here to keep default behavior the same



##########
sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java:
##########
@@ -723,8 +744,15 @@ public PCollection<MatchResult.Metadata> 
expand(PCollection<String> input) {
           res = input.apply(createWatchTransform(new 
ExtractFilenameFn())).apply(Values.create());
         }
       }
-      return res.apply(Reshuffle.viaRandomKey());
+      // Apply Reshuffle conditionally based on the new flag
+      if (getOutputParallelization()) {

Review Comment:
   I think outputParallelization=true means adding a ReShuffle.



##########
sdks/java/io/file-based-io-tests/src/test/java/org/apache/beam/sdk/io/text/TextIOIT.java:
##########
@@ -136,7 +136,7 @@ public void writeThenReadAll() {
 
     PCollection<String> consolidatedHashcode =
         testFilenames
-            .apply("Match all files", FileIO.matchAll())
+            .apply("Match all files", 
FileIO.matchAll().withOutputParallelization(true))

Review Comment:
   We don't need to change this after default set to true



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to