suryaprasanna opened a new pull request, #18209:
URL: https://github.com/apache/hudi/pull/18209
### Describe the issue this Pull Request addresses
This PR addresses an issue where metadata bootstrap can fail silently or
with unclear errors when encountering 0-byte base files in the table. During
metadata table rebootstrap, especially with record-level index enabled, the
presence of empty files can lead to corruption and unclear error messages. This
change adds early validation to detect and report such issues with clear
context.
### Summary and Changelog
Users will now get a clear, early failure message when metadata bootstrap
encounters a 0-byte file, making debugging easier and preventing silent
corruption.
**Changes:**
- Added validation in `HoodieMetadataPayload.java` to check that files being
added have positive size
- Added assertion with descriptive error message indicating the specific
0-byte file
- Added comprehensive test `testRecordIndexRebootstrapWithZeroByteBaseFile`
in `TestRecordLevelIndex.scala` that:
- Simulates corruption by replacing a base file with an empty file
- Verifies that metadata rebootstrap with record index fails with
appropriate exception
- Confirms error message contains the corrupted file name for easy
debugging
### Impact
**User-facing impact:** Users will receive clearer error messages when
metadata operations encounter 0-byte files, making debugging significantly
easier.
**API changes:** None
**Performance impact:** Minimal - adds a single size check during metadata
bootstrap.
### Risk Level
**Low** - This change only adds validation to fail fast on already
corrupt/invalid state. It does not change any successful code paths. The
validation prevents proceeding with corrupt data that would fail later with
unclear errors.
Verification:
- Added unit test that specifically exercises the new validation path
- Test confirms proper error message with file name is surfaced
- Existing tests continue to pass
### Documentation Update
None - this is an internal validation improvement that doesn't introduce new
configs or user-facing features.
### Contributor's checklist
- [x] Read through [contributor's
guide](https://hudi.apache.org/contribute/how-to-contribute)
- [x] Enough context is provided in the sections above
- [x] Adequate tests were added if applicable
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]