stevenzwu commented on issue #1383: URL: https://github.com/apache/iceberg/issues/1383#issuecomment-706771662
regarding `TableScan.appendsBetween`, we might need more flexibility of fine-grained control. E.g. if Flink job is lagging behind or bootstrap from an old snapshot, we probably don't want to eagerly plan all the unconsumed `FileScanTask`. That might blow up Flink checkpoint state if the enumerated list of `FileScanTask` is too big. I am thinking about two level of enumerations to keep the enumerator memory footprint in check. * first, enumerate the list of unconsumed `DataOperations.APPEND` snapshots. It is cheap to track and checkpoint this list * second, enumerate `FileScanTask` up to a configurable number of oldest snapshots (e.g. 6) from the first step if job is keeping up with the ingestion, we should only have one unconsumed snapshots. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
