nsivabalan commented on issue #8085:
URL: https://github.com/apache/hudi/issues/8085#issuecomment-1453904397

   hey hi @tatiana-rackspace :
   Deltastreamer as you might know is a streaming ingestion tool. 
   we have some source limit to consume for each batch. 
   incase fo kafka, its no of msgs. incase of DFS based sources, its number of 
bytes.
   
   you can configure the source limit using `--source-limit`. More info can be 
found here https://hudi.apache.org/docs/hoodie_deltastreamer 
   
   also, it depends on how much data was available when sync() was called. 
   lets say you have configured the min-sync-interval to 30 
mins(`--min-sync-interval-seconds`), deltastreamer will try to fetch data from 
source and sync to hudi once every 30 mins, 
   So, at t0, it will consume from source adhering to max limit you have 
configured. and then after 30 mins, it will again consume from source based on 
last checkpoint, again adhering to the source limit. 
   
   Let me know if this clarifies things. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to