[GitHub] [iceberg] stevenzwu opened a new issue, #5613: Flink: throttle FLIP-27 source enumerator for split discovery when Flink job is falling behind in streaming execution

GitBox Mon, 22 Aug 2022 11:49:43 -0700


stevenzwu opened a new issue, #5613:
URL: https://github.com/apache/iceberg/issues/5613


   ### Feature Request / Improvement
   
   Right now, FLIP-27 source eagerly discover all available splits from Iceberg 
table using incremental append scan. If the Flink job is falling behind with a 
lot of snapshots and data files, eagerly discovering all available splits can 
overwhelm enumerator with too many pending splits. It can increase memory 
pressure and enumerator checkpoint size. There is really no benefit of eagerly 
discovering all splits into memory. It is better to throttle the split 
discovery when there is certain number (configurable) of pending splits 
already. 
   
   PR #4943  (on the pre FLIP-27 source) is trying to avoid one incremental 
scan to cover too many snapshots. It can help. But it doesn't throttle or pause 
the split discovery if necessary.
   
   ### Query engine
   
   Flink


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] stevenzwu opened a new issue, #5613: Flink: throttle FLIP-27 source enumerator for split discovery when Flink job is falling behind in streaming execution

Reply via email to