suryaprasanna opened a new pull request, #17520:
URL: https://github.com/apache/hudi/pull/17520
### Describe the issue this Pull Request addresses
This PR removes the deprecated `ENABLE_OPTIMIZED_LOG_BLOCKS_SCAN`
configuration and migrates all log scanning operations to use the
ScanV2Internal API as the default implementation. The change simplifies the
codebase by eliminating the dual-path scanning logic that was maintained for
backward compatibility.
### Summary and Changelog
Users will no longer need to configure `ENABLE_OPTIMIZED_LOG_BLOCKS_SCAN` as
the optimized log scanning is now the default behavior. This change streamlines
the log reading path and removes approximately 436 lines of legacy code.
**Changes:**
- Removed `ENABLE_OPTIMIZED_LOG_BLOCKS_SCAN` configuration from
`HoodieReaderConfig` and `HoodieCompactionConfig`
- Removed conditional logic using `enableOptimizedLogBlocksScan` across
log scanning components
- Simplified `AbstractHoodieLogRecordScanner` and
`BaseHoodieLogRecordReader` by removing legacy scan path
- Updated `HoodieMergedLogRecordReader`, `HoodieMergedLogRecordScanner`,
and `HoodieUnMergedLogRecordScanner` to use ScanV2Internal exclusively
- Cleaned up references in metadata writers, clustering strategies, and
test utilities
- Updated Hive integration components to remove deprecated configuration
checks
### Impact
**Breaking Change:** The `ENABLE_OPTIMIZED_LOG_BLOCKS_SCAN` configuration
option has been removed. Users who explicitly set this configuration will need
to remove it from their configurations. The new
default behavior is equivalent to having this config enabled.
**Performance:** No performance impact expected as ScanV2Internal was
already the recommended and optimized path. Users who had the config disabled
will see performance improvements.
### Risk Level
**Low** - The ScanV2Internal API has been available and tested for several
releases. This change only removes the legacy fallback path. All existing tests
pass with the new default behavior.
### Documentation Update
- Configuration documentation needs to be updated to remove references to
`ENABLE_OPTIMIZED_LOG_BLOCKS_SCAN`
- Release notes should highlight this as a breaking change for users who
explicitly disabled the optimization
### Contributor's checklist
- [x] Read through [contributor's
guide](https://hudi.apache.org/contribute/how-to-contribute)
- [x] Enough context is provided in the sections above
- [x] Adequate tests were added if applicable
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]