[ 
https://issues.apache.org/jira/browse/FLINK-7595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16155312#comment-16155312
 ] 

Chesnay Schepler commented on FLINK-7595:
-----------------------------------------

I think this is a bug that must've been introduced around 1.3 when we 
refactored state to be stored by operator and not by task. We've always checked 
that all stateful tasks could be mapped to the new program, and failed 
otherwise unless the --allowNonRestoredState flag was set. Before the 
refactoring, the savepoint only contained entries for tasks that actually had 
some state, whereas now the savepoint has an entry for every operator, 
regardless of whether state was stored or not.

We could amend the else block in the SavepointLoader at L122
{code}
                        } else {
                                String msg = String.format("Failed to rollback 
to savepoint %s. " +
                                                                "Cannot map 
savepoint state for operator %s to the new program, " +
                                                                "because the 
operator is not available in the new program. If " +
                                                                "you want to 
allow to skip this, you can set the --allowNonRestoredState " +
                                                                "option on the 
CLI.",
                                                savepointPath, 
operatorState.getOperatorID());

                                throw new IllegalStateException(msg);
}
{code}
to additionally check whether the state is non-empty by iterating over the 
contained {{OperatorSubtaskStates}} and calling 
{{OperatorSubtaskState#hasState()}}.
Note that we can't use {{OperatorState#getStateSize()}} and compare it to zero 
as the documentation says that zero may also signify an unknown state size.

> Removing stateless task from task chain breaks savepoint restore
> ----------------------------------------------------------------
>
>                 Key: FLINK-7595
>                 URL: https://issues.apache.org/jira/browse/FLINK-7595
>             Project: Flink
>          Issue Type: Bug
>          Components: State Backends, Checkpointing
>            Reporter: Ufuk Celebi
>            Assignee: Chesnay Schepler
>         Attachments: ChainedTaskRemoveTest.java
>
>
> When removing a stateless operator from a 2-task chain where the head 
> operator is stateful breaks savepoint restore with 
> {code}
> Caused by: java.lang.IllegalStateException: Failed to rollback to savepoint 
> /var/folders/py/s_1l8vln6f19ygc77m8c4zhr0000gn/T/junit1167397515334838028/junit8006766303945373008/savepoint-cb0bcf-3cfa67865ac0.
>  Cannot map savepoint state...
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to