s3847243 opened a new pull request, #34677:
URL: https://github.com/apache/beam/pull/34677

   
   ### What
   This PR adds a new method `withReshuffle(boolean)` to `FileIO.matchAll()` to 
allow disabling the automatic reshuffling step (`Reshuffle.viaRandomKey()`).
   
   ### Why
   Currently, `FileIO.matchAll()` always applies a `Reshuffle` step. While this 
improves performance for wildcard patterns that expand into many files, it is 
not ideal when processing a large static list of file paths (e.g., 1M+). In 
such cases, reshuffling can block downstream fusion and autoscaling.
   
   This feature allows advanced users to opt out of reshuffling to improve 
performance and fusion behavior.
   
   ### How
   - Added a `getReshuffle()` property to the `MatchAll` AutoValue class, 
defaulting to `true`.
   - Updated the `expand()` method to conditionally apply the reshuffle based 
on the property.
   - Added `withReshuffle(boolean)` for API access.
   - Updated `FileIOTest` to verify the reshuffle toggle behavior.
   
   ### Fixes
   Fixes: #33330
   
   ---
   
   - [x] This addresses an open issue (`fixes #33330`)
   - [x] Tested and verified the updated behavior
   - [ ] I will update `CHANGES.md` upon approval if necessary
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@beam.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to