prashantwason opened a new pull request, #17465: URL: https://github.com/apache/hudi/pull/17465
### Describe the issue this Pull Request addresses During insert operations, HoodieCreateHandle checks if a record is deleted by calling `payload.getInsertValue().isPresent()`. This approach leads to unnecessary generation of the insert value, which can be computationally expensive for certain payload implementations. This PR introduces a performance optimization to avoid this overhead. ### Summary and Changelog **Summary:** This PR introduces a new method `hasInsertValue()` to the `HoodieRecordPayload` interface that allows payload implementations to efficiently check if they have a valid insert value without actually generating it. **Changelog:** - Added new `hasInsertValue(Schema schema, Properties properties)` method to `HoodieRecordPayload` interface with a default implementation that maintains backward compatibility - Updated `HoodieAvroRecord.checkIsDelete()` to use the new `hasInsertValue()` method instead of `getInsertValue().isPresent()` - Implemented optimized `hasInsertValue()` in `HoodieMetadataPayload` - checks internal flags without deserializing - Implemented optimized `hasInsertValue()` in `RawTripTestPayload` - checks `isDeleted` flag directly - Added helper methods `validatePayload()` and `getNestedFieldValue()` in `HoodieMetadataPayload` - Added utility methods in test payload classes for improved testing ### Impact **Performance Impact:** This change provides a significant performance improvement for write operations, particularly when dealing with payloads where generating the insert value is expensive. Payload implementations can now bypass the costly insert value generation during the deleted record check by implementing custom logic in `hasInsertValue()`. **Public API Change:** A new method `hasInsertValue()` is added to the `HoodieRecordPayload` interface. The default implementation ensures backward compatibility by delegating to the existing `getInsertValue()` method. ### Risk Level **Low** The change is backward compatible as the new method has a default implementation that preserves existing behavior. Existing payload implementations will continue to work without modification. The risk is mitigated by: - Default implementation maintains current behavior - Extensive testing with various payload types - No changes to external APIs or configuration - Internal optimization only, no storage format changes ### Documentation Update None required. This is an internal performance optimization that does not introduce new configs, user-facing features, or changes to existing behavior from a user perspective. ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Enough context is provided in the sections above - [x] Adequate tests were added if applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
