suryaprasanna opened a new pull request, #18209:
URL: https://github.com/apache/hudi/pull/18209

   ### Describe the issue this Pull Request addresses
   
   This PR addresses an issue where metadata bootstrap can fail silently or 
with unclear errors when encountering 0-byte base files in the table. During 
metadata table rebootstrap, especially with record-level index enabled, the 
presence of empty files can lead to corruption and unclear error messages. This 
change adds early validation to detect and report such issues with clear 
context.
   
   ### Summary and Changelog
   
   Users will now get a clear, early failure message when metadata bootstrap 
encounters a 0-byte file, making debugging easier and preventing silent 
corruption.
   
   **Changes:**
   - Added validation in `HoodieMetadataPayload.java` to check that files being 
added have positive size
   - Added assertion with descriptive error message indicating the specific 
0-byte file
   - Added comprehensive test `testRecordIndexRebootstrapWithZeroByteBaseFile` 
in `TestRecordLevelIndex.scala` that:
     - Simulates corruption by replacing a base file with an empty file
     - Verifies that metadata rebootstrap with record index fails with 
appropriate exception
     - Confirms error message contains the corrupted file name for easy 
debugging
   
   ### Impact
   
   **User-facing impact:** Users will receive clearer error messages when 
metadata operations encounter 0-byte files, making debugging significantly 
easier.
   
   **API changes:** None
   
   **Performance impact:** Minimal - adds a single size check during metadata 
bootstrap.
   
   ### Risk Level
   
   **Low** - This change only adds validation to fail fast on already 
corrupt/invalid state. It does not change any successful code paths. The 
validation prevents proceeding with corrupt data that would fail later with 
unclear errors.
   
   Verification:
   - Added unit test that specifically exercises the new validation path
   - Test confirms proper error message with file name is surfaced
   - Existing tests continue to pass
   
   ### Documentation Update
   
   None - this is an internal validation improvement that doesn't introduce new 
configs or user-facing features.
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Enough context is provided in the sections above
   - [x] Adequate tests were added if applicable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to