chenjunjiedada opened a new pull request, #4943:
URL: https://github.com/apache/iceberg/pull/4943

   This adds an option to control how many snapshots to monitor at once when 
using iceberg table as a Flink source. 
   
   Currently, the monitor operator generates file splits from the last consumed 
snapshot to the latest snapshot, which may lead to backpressure when the 
consumer lag behind as follow image shows. We can reduce the checkpoint lock 
scope (https://github.com/apache/iceberg/pull/4911) or increase the network 
buffer to mitigate the situation while the problem still cannot be completely 
avoided since the number of the splits is unknown, especially when starting a 
consumer for the first time.
   
![image](https://user-images.githubusercontent.com/3960228/171612916-bfacf692-3e08-4161-9937-aa1fc93a602f.png)
   
   With the option, the user can tune the monitoring flow according to 
backpressure and busy metrics. 
   
![image](https://user-images.githubusercontent.com/3960228/171615504-4d210b51-235a-4071-a547-8d463dc77385.png)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to