prashantwason opened a new pull request, #17465:
URL: https://github.com/apache/hudi/pull/17465

   ### Describe the issue this Pull Request addresses
   
   During insert operations, HoodieCreateHandle checks if a record is deleted 
by calling `payload.getInsertValue().isPresent()`. This approach leads to 
unnecessary generation of the insert value, which can be computationally 
expensive for certain payload implementations.
   
   This PR introduces a performance optimization to avoid this overhead.
   
   ### Summary and Changelog
   
   **Summary:** 
   This PR introduces a new method `hasInsertValue()` to the 
`HoodieRecordPayload` interface that allows payload implementations to 
efficiently check if they have a valid insert value without actually generating 
it.
   
   **Changelog:**
   - Added new `hasInsertValue(Schema schema, Properties properties)` method to 
`HoodieRecordPayload` interface with a default implementation that maintains 
backward compatibility
   - Updated `HoodieAvroRecord.checkIsDelete()` to use the new 
`hasInsertValue()` method instead of `getInsertValue().isPresent()`
   - Implemented optimized `hasInsertValue()` in `HoodieMetadataPayload` - 
checks internal flags without deserializing
   - Implemented optimized `hasInsertValue()` in `RawTripTestPayload` - checks 
`isDeleted` flag directly
   - Added helper methods `validatePayload()` and `getNestedFieldValue()` in 
`HoodieMetadataPayload`
   - Added utility methods in test payload classes for improved testing
   
   ### Impact
   
   **Performance Impact:** This change provides a significant performance 
improvement for write operations, particularly when dealing with payloads where 
generating the insert value is expensive. Payload implementations can now 
bypass the costly insert value generation during the deleted record check by 
implementing custom logic in `hasInsertValue()`.
   
   **Public API Change:** A new method `hasInsertValue()` is added to the 
`HoodieRecordPayload` interface. The default implementation ensures backward 
compatibility by delegating to the existing `getInsertValue()` method.
   
   ### Risk Level
   
   **Low**
   
   The change is backward compatible as the new method has a default 
implementation that preserves existing behavior. Existing payload 
implementations will continue to work without modification. The risk is 
mitigated by:
   - Default implementation maintains current behavior
   - Extensive testing with various payload types
   - No changes to external APIs or configuration
   - Internal optimization only, no storage format changes
   
   ### Documentation Update
   
   None required. This is an internal performance optimization that does not 
introduce new configs, user-facing features, or changes to existing behavior 
from a user perspective.
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Enough context is provided in the sections above
   - [x] Adequate tests were added if applicable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to