suryaprasanna opened a new pull request, #18412: URL: https://github.com/apache/hudi/pull/18412
### Describe the issue this Pull Request addresses Ports the logging and monitoring improvements from the earlier `AbstractHoodieLogRecordReader` changes into the current `BaseHoodieLogRecordReader` implementation. The current reader was missing block-level scan visibility and downstream propagation of the resulting metrics, which made it harder to understand log scanning behavior during compaction and related read paths. ### Summary and Changelog Adds scan-level and block-level observability to `BaseHoodieLogRecordReader` and wires the resulting metrics into the modern reader pipeline. - add log block scan metrics to `BaseHoodieLogRecordReader` - add block scan duration, scanned block size, valid block count, and per-block stats - add additional logging for block scanning, rollback handling, compaction mapping, and valid instants - fix rollback accounting for removed blocks during scan - propagate reader metrics through `LogScanningRecordBufferLoader` into `HoodieReadStats` - use propagated stats in file-group append and merge handles for log block count and compacted log size - add test coverage for the extended `HoodieReadStats` ### Impact Improves observability for log block scanning and makes compaction/write stats reflect scanned valid log blocks and scanned block bytes instead of relying only on aggregate log file size. There is no public API or config change. ### Risk Level medium This changes how internal log scan statistics are computed and propagated into write stats. Verification done: - added unit coverage for the new `HoodieReadStats` fields - ran targeted `hudi-common` tests - compiled `hudi-client-common` and dependent modules successfully ### Documentation Update none ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Enough context is provided in the sections above - [x] Adequate tests were added if applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
