[I] S3 incremental source should impose limit on num of files consumed per batch [hudi]

via GitHub Mon, 02 Feb 2026 23:35:44 -0800


lokeshj1703 opened a new issue, #18075:
URL: https://github.com/apache/hudi/issues/18075


   ### Bug Description
   
   **What happened:**
   Currently the cloud incremental source configures limit on number of bytes 
read from source. But with very small files, the number of files read can 
increase drastically and managing all the files metadata in driver can lead to 
OOM or memory contention. The Issue aims to add a limit on the number of rows 
read by the source as well to reduce the memory overhead on driver.
   
   **What you expected:**
   
   **Steps to reproduce:**
   1.
   2.
   3.
   
   
   ### Environment
   
   **Hudi version:**
   **Query engine:** (Spark/Flink/Trino etc)
   **Relevant configs:**
   
   
   ### Logs and Stack Trace
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] S3 incremental source should impose limit on num of files consumed per batch [hudi]

Reply via email to