stevenzwu commented on PR #4943:
URL: https://github.com/apache/iceberg/pull/4943#issuecomment-1146851010

   In general, I think this is a right direction. It is not mutually exclusive 
with the PR #4911 (reduce the checkpoint lock scope). We should have both. This 
is focused on making the plan smaller and faster. PR #4911 can avoid holding 
the lock beyond what is actually necessary.
   
   For the new FLIP-27 source, I have been thinking about something very 
similar. There is no point of eagerly discover all splits/snapshots if the 
Flink job is falling behind too much. We need to throttle the split discovery. 
In addition to limit the number of snapshots per discovery cycle, I am also 
thinking about that we should pause/skip the split discovery, if the number of 
pending splits is over a certain threshold. It is like a backpressure 
mechanism. This can help control the memory footprint. This won't be in the MVP 
version of FLIP-27 source. We can follow up on the optimization after MVP 
version is merged.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to