gemini-code-assist[bot] commented on code in PR #35730:
URL: https://github.com/apache/beam/pull/35730#discussion_r2240698776


##########
sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java:
##########
@@ -723,9 +735,16 @@
           res = input.apply(createWatchTransform(new 
ExtractFilenameFn())).apply(Values.create());
         }
       }
-      return res.apply(Reshuffle.viaRandomKey());
+      // Apply Reshuffle conditionally based on the new flag
+      if (getAvoidReshuffle()) {
+        return res; // No reshuffle
+      } else {
+        return res.apply(Reshuffle.viaRandomKey());
+      }
     }
 
+    public abstract boolean getAvoidReshuffle();

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   This new public method should have Javadoc to document its purpose.
   
   ```suggestion
       /** Returns whether to avoid the reshuffle operation. */
       public abstract boolean getAvoidReshuffle();
   ```



##########
sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java:
##########
@@ -373,9 +373,11 @@ public static Match match() {
   public static MatchAll matchAll() {
     return new AutoValue_FileIO_MatchAll.Builder()
         
.setConfiguration(MatchConfiguration.create(EmptyMatchTreatment.ALLOW_IF_WILDCARD))
+            .setAvoidReshuffle(false)

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   The indentation here is inconsistent with the surrounding code. To improve 
readability and maintain consistency, please align this line with the 
`.setConfiguration` call on the preceding line.
   
   ```suggestion
       .setAvoidReshuffle(false)
   ```



##########
sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java:
##########
@@ -677,13 +679,23 @@
       abstract Builder setConfiguration(MatchConfiguration configuration);
 
       abstract MatchAll build();
+
+      abstract Builder setAvoidReshuffle(boolean b);
     }
 
     /** Like {@link Match#withConfiguration}. */
     public MatchAll withConfiguration(MatchConfiguration configuration) {
       return toBuilder().setConfiguration(configuration).build();
     }
 
+    public MatchAll avoidReshuffle() {
+      return toBuilder().setAvoidReshuffle(true).build();
+    }
+
+    public MatchAll withAvoidReshuffle(boolean avoidReshuffle) {
+      return toBuilder().setAvoidReshuffle(avoidReshuffle).build();
+    }

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   These new public methods are part of the API and should have Javadoc to 
explain their purpose and the performance implications of using them. This will 
help users understand when and how to use this new feature.
   
   ```java
       /**
        * Specifies to avoid the reshuffle operation.
        *
        * <p>This is a performance optimization for pipelines that match a 
small number of filepatterns. Disabling reshuffling may cause performance 
degradation if the number of filepatterns is large, as it may cause matcher 
operations to be fused with downstream transforms.
        *
        * <p>By default, a {@link org.apache.beam.sdk.transforms.Reshuffle} is 
applied to the matched files to break fusion and improve parallelism.
        */
       public MatchAll avoidReshuffle() {
         return toBuilder().setAvoidReshuffle(true).build();
       }
   
       /**
        * Specifies whether to avoid the reshuffle operation.
        *
        * <p>This is a performance optimization for pipelines that match a 
small number of filepatterns. Disabling reshuffling may cause performance 
degradation if the number of filepatterns is large, as it may cause matcher 
operations to be fused with downstream transforms.
        *
        * <p>By default, a {@link org.apache.beam.sdk.transforms.Reshuffle} is 
applied to the matched files to break fusion and improve parallelism.
        */
       public MatchAll withAvoidReshuffle(boolean avoidReshuffle) {
         return toBuilder().setAvoidReshuffle(avoidReshuffle).build();
       }
   ```



##########
sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java:
##########
@@ -677,13 +679,23 @@
       abstract Builder setConfiguration(MatchConfiguration configuration);
 
       abstract MatchAll build();
+
+      abstract Builder setAvoidReshuffle(boolean b);

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   The parameter name `b` is not descriptive. Please consider using a more 
descriptive name like `avoidReshuffle` to improve code clarity and 
maintainability.
   
   ```suggestion
       abstract Builder setAvoidReshuffle(boolean avoidReshuffle);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to