nsivabalan opened a new pull request, #18387:
URL: https://github.com/apache/hudi/pull/18387
### Describe the issue this Pull Request addresses
Optimized TableSchemaResolver.getTableInternalSchemaFromCommitMetadata() to
use short-circuit evaluation when searching for the most recent schema-updating
instant. The previous implementation filtered the entire timeline and then
called lastInstant(), which required processing all instants. The new
implementation uses getReverseOrderedInstants().filter(...).findFirst() to stop
as soon as the first (most recent) matching instant is found.
### Summary and Changelog
Summary:
Users with tables that have long timelines will experience faster internal
schema lookups, especially when recent commits contain non-schema-updating
operations (CLUSTER, COMPACT, INDEX, LOG_COMPACT).
Changelog:
- Refactored
TableSchemaResolver.getTableInternalSchemaFromCommitMetadata() to use
getReverseOrderedInstants().filter(...).findFirst()
instead of filter(...).lastInstant()
- This enables short-circuit evaluation - the method stops immediately
upon finding the first (most recent) schema-updating instant
- Added 4 comprehensive unit tests to validate correctness and verify the
short-circuit behavior
- Added inline documentation explaining the optimization
Technical details:
- Before: completedInstants.filter(predicate) → creates filtered timeline
→ lastInstant() → processes all instants
- After:
completedInstants.getReverseOrderedInstants().filter(predicate).findFirst() →
stops at first match
### Impact
Performance improvement with no behavioral changes:
- Reduces the number of commit metadata reads required, especially
beneficial for:
- Tables with long timelines (hundreds or thousands of commits)
- Scenarios where recent commits are non-schema-updating operations
### Risk Level
low
### Documentation Update
None required.
### Contributor's checklist
- [ ] Read through [contributor's
guide](https://hudi.apache.org/contribute/how-to-contribute)
- [ ] Enough context is provided in the sections above
- [ ] Adequate tests were added if applicable
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]