lokeshj1703 opened a new pull request, #18076:
URL: https://github.com/apache/hudi/pull/18076

   ### Describe the issue this Pull Request addresses
   
   Issue https://github.com/apache/hudi/issues/18075
   
   Currently the cloud incremental source configures limit on number of bytes 
read from source. But with very small files, the number of files read can 
increase drastically and managing all the files metadata in driver can lead to 
OOM. The Issue aims to add a limit on the number of rows read by the source as 
well to reduce the memory overhead on driver.
   
   ### Summary and Changelog
   
   Adds a new configuration for limiting the number of rows read by hoodie 
incremental source. 
   
   ### Impact
   
   Helps reduce the memory overhead on driver with cloud incremental source by 
limiting the number of files read.
   
   ### Risk Level
   
   Low
   
   ### Documentation Update
   
   NA
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Enough context is provided in the sections above
   - [ ] Adequate tests were added if applicable
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to