gemini-code-assist[bot] commented on code in PR #35730: URL: https://github.com/apache/beam/pull/35730#discussion_r2240698776
########## sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java: ########## @@ -723,9 +735,16 @@ res = input.apply(createWatchTransform(new ExtractFilenameFn())).apply(Values.create()); } } - return res.apply(Reshuffle.viaRandomKey()); + // Apply Reshuffle conditionally based on the new flag + if (getAvoidReshuffle()) { + return res; // No reshuffle + } else { + return res.apply(Reshuffle.viaRandomKey()); + } } + public abstract boolean getAvoidReshuffle(); Review Comment:  This new public method should have Javadoc to document its purpose. ```suggestion /** Returns whether to avoid the reshuffle operation. */ public abstract boolean getAvoidReshuffle(); ``` ########## sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java: ########## @@ -373,9 +373,11 @@ public static Match match() { public static MatchAll matchAll() { return new AutoValue_FileIO_MatchAll.Builder() .setConfiguration(MatchConfiguration.create(EmptyMatchTreatment.ALLOW_IF_WILDCARD)) + .setAvoidReshuffle(false) Review Comment:  The indentation here is inconsistent with the surrounding code. To improve readability and maintain consistency, please align this line with the `.setConfiguration` call on the preceding line. ```suggestion .setAvoidReshuffle(false) ``` ########## sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java: ########## @@ -677,13 +679,23 @@ abstract Builder setConfiguration(MatchConfiguration configuration); abstract MatchAll build(); + + abstract Builder setAvoidReshuffle(boolean b); } /** Like {@link Match#withConfiguration}. */ public MatchAll withConfiguration(MatchConfiguration configuration) { return toBuilder().setConfiguration(configuration).build(); } + public MatchAll avoidReshuffle() { + return toBuilder().setAvoidReshuffle(true).build(); + } + + public MatchAll withAvoidReshuffle(boolean avoidReshuffle) { + return toBuilder().setAvoidReshuffle(avoidReshuffle).build(); + } Review Comment:  These new public methods are part of the API and should have Javadoc to explain their purpose and the performance implications of using them. This will help users understand when and how to use this new feature. ```java /** * Specifies to avoid the reshuffle operation. * * <p>This is a performance optimization for pipelines that match a small number of filepatterns. Disabling reshuffling may cause performance degradation if the number of filepatterns is large, as it may cause matcher operations to be fused with downstream transforms. * * <p>By default, a {@link org.apache.beam.sdk.transforms.Reshuffle} is applied to the matched files to break fusion and improve parallelism. */ public MatchAll avoidReshuffle() { return toBuilder().setAvoidReshuffle(true).build(); } /** * Specifies whether to avoid the reshuffle operation. * * <p>This is a performance optimization for pipelines that match a small number of filepatterns. Disabling reshuffling may cause performance degradation if the number of filepatterns is large, as it may cause matcher operations to be fused with downstream transforms. * * <p>By default, a {@link org.apache.beam.sdk.transforms.Reshuffle} is applied to the matched files to break fusion and improve parallelism. */ public MatchAll withAvoidReshuffle(boolean avoidReshuffle) { return toBuilder().setAvoidReshuffle(avoidReshuffle).build(); } ``` ########## sdks/java/core/src/main/java/org/apache/beam/sdk/io/FileIO.java: ########## @@ -677,13 +679,23 @@ abstract Builder setConfiguration(MatchConfiguration configuration); abstract MatchAll build(); + + abstract Builder setAvoidReshuffle(boolean b); Review Comment:  The parameter name `b` is not descriptive. Please consider using a more descriptive name like `avoidReshuffle` to improve code clarity and maintainability. ```suggestion abstract Builder setAvoidReshuffle(boolean avoidReshuffle); ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org