yihua opened a new pull request, #11935:
URL: https://github.com/apache/hudi/pull/11935

   ### Change Logs
   
   The shared HFile reader in `HoodieNativeAvroHFileReader` uses non-trivial 
amount of memory (see the screenshots below) and is kept open for reading meta 
info from the HFile.  This PR adds the changes to avoid keeping the reference 
to the shared HFile reader by removing the shared HFile reader and caching the 
meta info by loading the information once, so the memory usage is reduced.
   
   This PR also enables the native HFile reader 
(`_hoodie.hfile.use.native.reader`) by default again.  It was turned off by 
default before in #11488 due to OOM in CI.
   
   Screenshots of memory usage of `HoodieNativeAvroHFileReader`:
   <img width="1868" alt="Screenshot 2024-06-21 at 17 53 07" 
src="https://github.com/user-attachments/assets/38f249c6-a467-471b-9bd9-5d3afa46d1bc";>
   <img width="1752" alt="Screenshot 2024-09-12 at 08 18 25" 
src="https://github.com/user-attachments/assets/8a9ad95f-5e5c-466f-8274-cca8f063ba75";>
   
   ### Impact
   
   Reduces memory usage.
   
   ### Risk level
   
   low
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to