[
https://issues.apache.org/jira/browse/FLINK-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14040602#comment-14040602
]
Markus Holzemer commented on FLINK-909:
---------------------------------------
I also stumbled over this issue a few times. Since I am currently in the
process of refactoring the iterations runtime I will have a look at this issue.
It should be possible to add a barrier at the start of each superstep and wait
for an explicit OK message from the iteration head task (that is managing a
single iteration instance at one taskmanager) before the next superstep can
start.
> Pitfall due to additional superstep after the iteration has stopped
> -------------------------------------------------------------------
>
> Key: FLINK-909
> URL: https://issues.apache.org/jira/browse/FLINK-909
> Project: Flink
> Issue Type: Bug
> Reporter: GitHub Import
> Labels: github-import
> Fix For: pre-apache
>
>
> Currently, after an iteration has exceeded the maximum number of iterations,
> all tasks are started again for an additional superstep during which they are
> stopped. This works if a tasks only waits for dynamic input. However, in the
> case where one has a task, e.g. a coGroup operation, which gets dynamic and
> static input the execution is not blocked. This can then lead to erroneous
> behaviour which the user is not aware of.
> I had this problem implementing ALS. Here one has a loop which gets as
> dynamic input matrix columns and as static input matrix entries. The columns
> and the entries are used to construct a matrix which represents a system of
> linear equations. If the set of columns are empty, then the matrix is
> singular and thus not solvable. During the additional superstep the task
> won't receive any columns but would still try to solve the now singular
> matrix.
> It would be good to finish the iteration without initiating this additional
> superstep.
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/issues/909
> Created by: [tillrohrmann|https://github.com/tillrohrmann]
> Labels:
> Created at: Thu Jun 05 17:50:17 CEST 2014
> State: open
--
This message was sent by Atlassian JIRA
(v6.2#6252)