yihua opened a new pull request, #11935: URL: https://github.com/apache/hudi/pull/11935
### Change Logs The shared HFile reader in `HoodieNativeAvroHFileReader` uses non-trivial amount of memory (see the screenshots below) and is kept open for reading meta info from the HFile. This PR adds the changes to avoid keeping the reference to the shared HFile reader by removing the shared HFile reader and caching the meta info by loading the information once, so the memory usage is reduced. This PR also enables the native HFile reader (`_hoodie.hfile.use.native.reader`) by default again. It was turned off by default before in #11488 due to OOM in CI. Screenshots of memory usage of `HoodieNativeAvroHFileReader`: <img width="1868" alt="Screenshot 2024-06-21 at 17 53 07" src="https://github.com/user-attachments/assets/38f249c6-a467-471b-9bd9-5d3afa46d1bc"> <img width="1752" alt="Screenshot 2024-09-12 at 08 18 25" src="https://github.com/user-attachments/assets/8a9ad95f-5e5c-466f-8274-cca8f063ba75"> ### Impact Reduces memory usage. ### Risk level low ### Documentation Update N/A ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
