stevenzwu commented on PR #4943: URL: https://github.com/apache/iceberg/pull/4943#issuecomment-1146851010
In general, I think this is a right direction. It is not mutually exclusive with the PR #4911 (reduce the checkpoint lock scope). We should have both. This is focused on making the plan smaller and faster. PR #4911 can avoid holding the lock beyond what is actually necessary. For the new FLIP-27 source, I have been thinking about something very similar. There is no point of eagerly discover all splits/snapshots if the Flink job is falling behind too much. We need to throttle the split discovery. In addition to limit the number of snapshots per discovery cycle, I am also thinking about that we should pause/skip the split discovery, if the number of pending splits is over a certain threshold. It is like a backpressure mechanism. This can help control the memory footprint. This won't be in the MVP version of FLIP-27 source. We can follow up on the optimization after MVP version is merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
