rdblue commented on pull request #1515:
URL: https://github.com/apache/iceberg/pull/1515#issuecomment-707870292


   Thanks for the context, everyone. It sounds like we have two main issues to 
address. First, how do we ensure people are using checkpoints reliably given 
that batch doesn't need them (check `!batch && !checkpointing`?). And second, 
what guarantees do we want to make for a sink like this? From @JingsongLi's 
comment, I think we should target at-least-once but we should all agree on 
whether that is the goal.
   
   > I think for most Flink cases, if we don't enable checkpoint, it was 
at-most-once guarantee because it will lost the state generated during job 
failover, unless we manually read data from beginning.
   
   When we have situations like this, we read all of the available data in the 
topic to get at-least-once behavior, assuming that the job hasn't been paused 
for so long that the data has been removed. It sounds like this is what other 
people do as well?
   
   > Your ideas is to make IcebergWriter to recover the state by finding the 
last successful commit , am I right ?
   
   Yes. The purpose is to get less duplication for at-least-once by recovering 
the offsets if we can. I'm not sure whether Flink allows you to construct 
custom offsets like this, though. I'll defer to the expertise of @openinx and 
@JingsongLi.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to