[ 
https://issues.apache.org/jira/browse/FLINK-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15754312#comment-15754312
 ] 

ASF GitHub Bot commented on FLINK-3257:
---------------------------------------

Github user senorcarbone commented on the issue:

    https://github.com/apache/flink/pull/1668
  
    Working on it atm . I decided to make the following optimisations but want 
to very quickly make sure that async checkpointing works the way I believe it 
does:
    - Most importantly, I am changing the iteration head to always forward 
records. Their effects are not present in any in-progress snapshot anyway so 
that I should had done from the very beginning. :)
    - If `ListState` is checkpointed asynchronously, depending on the backend I 
suppose, then the current version of it, during the snapshot, will be persisted 
as a copy, which means that we can apply mutations right away and therefore 
reset it right after invoking the snapshot to the beginning of the next 
in-progress snapshot (some indexing involved). That way we do not need to open 
new ListStates in the first place. Does this make sense?
    
    @StephanEwen Please correct me if I am wrong, regarding the second point. I 
am just not very familiar with async snapshotting for `ListState` (this is not 
clear in the documentation for me). Mind also that I do not use the 
`CheckpointedAsychronously` interface, it seems to be heading towards 
deprecation. Thanks!


> Add Exactly-Once Processing Guarantees in Iterative DataStream Jobs
> -------------------------------------------------------------------
>
>                 Key: FLINK-3257
>                 URL: https://issues.apache.org/jira/browse/FLINK-3257
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: Paris Carbone
>            Assignee: Paris Carbone
>
> The current snapshotting algorithm cannot support cycles in the execution 
> graph. An alternative scheme can potentially include records in-transit 
> through the back-edges of a cyclic execution graph (ABS [1]) to achieve the 
> same guarantees.
> One straightforward implementation of ABS for cyclic graphs can work as 
> follows along the lines:
> 1) Upon triggering a barrier in an IterationHead from the TaskManager start 
> block output and start upstream backup of all records forwarded from the 
> respective IterationSink.
> 2) The IterationSink should eventually forward the current snapshotting epoch 
> barrier to the IterationSource.
> 3) Upon receiving a barrier from the IterationSink, the IterationSource 
> should finalize the snapshot, unblock its output and emit all records 
> in-transit in FIFO order and continue the usual execution.
> --
> Upon restart the IterationSource should emit all records from the injected 
> snapshot first and then continue its usual execution.
> Several optimisations and slight variations can be potentially achieved but 
> this can be the initial implementation take.
> [1] http://arxiv.org/abs/1506.08603



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to