[
https://issues.apache.org/jira/browse/FLINK-3257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15744648#comment-15744648
]
ASF GitHub Bot commented on FLINK-3257:
---------------------------------------
Github user senorcarbone commented on the issue:
https://github.com/apache/flink/pull/1668
These are some good points @StephanEwen, thanks for checking it.
How about the following, regarding each issue:
- `Concurrent Checkpoints`: Looks like an improvement but I can sure do it
in this PR if it is a crucial one. Can you elaborate a bit more or point me
out to other concurrent checkpointing operator state examples to get an idea of
how you want to do it?
- `Reconfiguration` : Sounds interesting...but I am not really aware of it
from the devlist. If it is simple enough I could add support for it here.
Otherwise I would suggest we address this in a seperate JIRA and PR as an
improvement. Is there a design document on how we plan to achieve
reconfiguration and repartitioning for operator state specifically somewhere?
- `At-most-once blocking queue` : It is obvious from my previous comments
that I do not approve this part, but that is something we already got rid of in
[FLIP-15](https://cwiki.apache.org/confluence/display/FLINK/FLIP-15+Scoped+Loops+and+Job+Termination)
already
([this](https://github.com/FouadMA/flink/commit/9adaac435bcaf3552afe564c739d4e8fd79c433b)
commit). How about we address this together with the deadlocks in FLIP-15?
- `Deadlocks`: I like the elastic spilling channel idea to resolve
deadlocks. I need time to dig a bit more into this and make sure we solve
deadlocks and not just improve. Is it ok with you if we address that in
[FLIP-15](https://cwiki.apache.org/confluence/display/FLINK/FLIP-15+Scoped+Loops+and+Job+Termination)?
I need more time for this part, plus, we need to combine the absense of
expiring queues with a proper termination algorithm (otherwise we just solve
the deadlocks and the jobs never terminate).
What do you think?
> Add Exactly-Once Processing Guarantees in Iterative DataStream Jobs
> -------------------------------------------------------------------
>
> Key: FLINK-3257
> URL: https://issues.apache.org/jira/browse/FLINK-3257
> Project: Flink
> Issue Type: Improvement
> Reporter: Paris Carbone
> Assignee: Paris Carbone
>
> The current snapshotting algorithm cannot support cycles in the execution
> graph. An alternative scheme can potentially include records in-transit
> through the back-edges of a cyclic execution graph (ABS [1]) to achieve the
> same guarantees.
> One straightforward implementation of ABS for cyclic graphs can work as
> follows along the lines:
> 1) Upon triggering a barrier in an IterationHead from the TaskManager start
> block output and start upstream backup of all records forwarded from the
> respective IterationSink.
> 2) The IterationSink should eventually forward the current snapshotting epoch
> barrier to the IterationSource.
> 3) Upon receiving a barrier from the IterationSink, the IterationSource
> should finalize the snapshot, unblock its output and emit all records
> in-transit in FIFO order and continue the usual execution.
> --
> Upon restart the IterationSource should emit all records from the injected
> snapshot first and then continue its usual execution.
> Several optimisations and slight variations can be potentially achieved but
> this can be the initial implementation take.
> [1] http://arxiv.org/abs/1506.08603
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)