openinx commented on pull request #1515:
URL: https://github.com/apache/iceberg/pull/1515#issuecomment-708196162


   We have few discussion in our team, for the first question how do we 
distinguish the batch job and streaming job without checkpoint states.  In 
current flink 1.11,  there's no way to indicate it's batch job,  so we could 
only extend the `BoundedOneInput` interface to do the iceberg transaction 
commit.  In theory, we shouldn't break the big transaction into several small 
transactions when in batch mode because users would expected the job to be 
committed successfully or rollback in atomically.   Currently,  we may could 
set a property in iceberg flink sink, to indicate whether it's batch or 
streaming job  explicitly.  In future flink release,  there will be methods to 
accomplish. 
   
   For the second question,  at-least-once or at-most-once ?  If the kafka 
source have enough data that we flink job could start from,  then we don't loss 
any data from the source operator then we have the at-least-once guarantee.  
For less duplication when recovering,  there's no flink interface to keep the 
latest successful consumed offset in iceberg sink I think, if someone really 
want to do that, then could use the system timestamp or user-defined field 
which could persist in iceberg table properties. 
   
   For now,  I totally agree with @rdblue that have a check to throw the 
exception that iceberg don't support streaming job with checkpoint disabled. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to