[
https://issues.apache.org/jira/browse/FLINK-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15943105#comment-15943105
]
ASF GitHub Bot commented on FLINK-3257:
---------------------------------------
Github user senorcarbone commented on the issue:
https://github.com/apache/flink/pull/1668
Thanks for the review @gyfora and @StephanEwen , these are very good points.
@StephanEwen makes sense to not really index/keep metadata of individual
records in log slices, it is extra overhead. Writing raw operator state makes
sense, so I will do that once @StefanRRichter gives me some pointers, that
would be great.
Any redistribution of the checkpoint slices would violate causality so I
hope the "list redistribution pattern" actually keeps the set of registered
operator states per instance intact. The garbage collection issue still remains
but maybe (if @StefanRRichter approves) I can add an `unregister` functionality
to the `OperatorStateStore`.
I can also add preconfigured operators (not that they will be reused
anywhere). It is more clean but I really need to see how can I get full control
of the `task` checkpointing behaviour from the `operator` level (since the
default task checkpointing behaviour is altered at the task-level).
> Add Exactly-Once Processing Guarantees in Iterative DataStream Jobs
> -------------------------------------------------------------------
>
> Key: FLINK-3257
> URL: https://issues.apache.org/jira/browse/FLINK-3257
> Project: Flink
> Issue Type: Improvement
> Components: DataStream API
> Reporter: Paris Carbone
> Assignee: Paris Carbone
>
> The current snapshotting algorithm cannot support cycles in the execution
> graph. An alternative scheme can potentially include records in-transit
> through the back-edges of a cyclic execution graph (ABS [1]) to achieve the
> same guarantees.
> One straightforward implementation of ABS for cyclic graphs can work as
> follows along the lines:
> 1) Upon triggering a barrier in an IterationHead from the TaskManager start
> block output and start upstream backup of all records forwarded from the
> respective IterationSink.
> 2) The IterationSink should eventually forward the current snapshotting epoch
> barrier to the IterationSource.
> 3) Upon receiving a barrier from the IterationSink, the IterationSource
> should finalize the snapshot, unblock its output and emit all records
> in-transit in FIFO order and continue the usual execution.
> --
> Upon restart the IterationSource should emit all records from the injected
> snapshot first and then continue its usual execution.
> Several optimisations and slight variations can be potentially achieved but
> this can be the initial implementation take.
> [1] http://arxiv.org/abs/1506.08603
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)