stevenzwu commented on issue #1383:
URL: https://github.com/apache/iceberg/issues/1383#issuecomment-706771662


   regarding `TableScan.appendsBetween`, we might need more flexibility of 
fine-grained control. E.g. if Flink job is lagging behind or bootstrap from an 
old snapshot, we probably don't want to eagerly plan all the unconsumed 
`FileScanTask`. That might blow up Flink checkpoint state if the enumerated 
list of `FileScanTask` is too big.
   
   I am thinking about two level of enumerations to keep the enumerator memory 
footprint in check.
   * first, enumerate the list of unconsumed `DataOperations.APPEND` snapshots. 
It is cheap to track and checkpoint this list
   * second, enumerate `FileScanTask` up to a configurable number of oldest 
snapshots (e.g. 6)  from the first step
   
   if job is keeping up with the ingestion, we should only have one unconsumed 
snapshots.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to