prashantwason opened a new pull request, #18027: URL: https://github.com/apache/hudi/pull/18027
### Describe the issue this Pull Request addresses Closes HUDI-2619 When building a rollback plan using markers, data files that were deleted during `finalizeWrite()` can get included in the rollback requests. This results in the metadata table receiving delete operations for non-existent files. ### Summary and Changelog This PR adds a filter to check if data files actually exist before including them in the marker-based rollback plan. **Changes:** - Added file existence check in `MarkerBasedRollbackStrategy.getRollbackRequests()` after collecting rollback requests from markers - Requests with empty `filesToBeDeleted` (e.g., APPEND operations) are passed through without the check - Only requests where the file actually exists are included in the final rollback plan - Updated test expectations in `TestMarkerBasedRollbackStrategy` to reflect the new filtering behavior ### Impact - No public API changes - Improves correctness of rollback operations by preventing metadata table from receiving deletes for non-existent files - Minor performance impact due to additional file existence checks, but this ensures data consistency ### Risk Level low - The change adds a defensive check that filters out invalid rollback requests. Existing tests validate the behavior. ### Documentation Update none ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Enough context is provided in the sections above - [x] Adequate tests were added if applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
