[ 
https://issues.apache.org/jira/browse/FLINK-6742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16062282#comment-16062282
 ] 

ASF GitHub Bot commented on FLINK-6742:
---------------------------------------

Github user gyfora commented on a diff in the pull request:

    https://github.com/apache/flink/pull/4083#discussion_r123895607
  
    --- Diff: 
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/savepoint/SavepointV2.java
 ---
    @@ -168,10 +168,27 @@ public static Savepoint 
convertToOperatorStateSavepointV2(
                                expandedToLegacyIds = true;
                        }
     
    +                   if (jobVertex == null) {
    +                           throw new IllegalStateException(
    +                                   "Could not find task for state with ID 
" + taskState.getJobVertexID() + ". " +
    +                                   "When migrating a savepoint from a 
version < 1.3 please make sure that the topology was not " +
    +                                   "changed through removal of a stateful 
operator or modification of a chain containing a stateful " +
    +                                   "operator.");
    +                   }
    +
                        List<OperatorID> operatorIDs = 
jobVertex.getOperatorIDs();
     
                        for (int subtaskIndex = 0; subtaskIndex < 
jobVertex.getParallelism(); subtaskIndex++) {
    -                           SubtaskState subtaskState = 
taskState.getState(subtaskIndex);
    +                           SubtaskState subtaskState;
    +                           try {
    +                                   subtaskState = 
taskState.getState(subtaskIndex);
    --- End diff --
    
    Sorry for commenting late on this but I have had some major migration 
issues in the last few days :D 
    I think we should explicitly compare parallelism instead of relying on the 
error:
    if (taskState.getStates().size() != jobVertex.getParallelism()) --> error
    
    Otherwise this will not fail on lower parallelism.


> Improve error message when savepoint migration fails due to task removal
> ------------------------------------------------------------------------
>
>                 Key: FLINK-6742
>                 URL: https://issues.apache.org/jira/browse/FLINK-6742
>             Project: Flink
>          Issue Type: Bug
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.3.0
>            Reporter: Gyula Fora
>            Assignee: Chesnay Schepler
>            Priority: Minor
>              Labels: flink-rel-1.3.1-blockers
>
> Caused by: java.lang.NullPointerException
>       at 
> org.apache.flink.runtime.checkpoint.savepoint.SavepointV2.convertToOperatorStateSavepointV2(SavepointV2.java:171)
>       at 
> org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:75)
>       at 
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreSavepoint(CheckpointCoordinator.java:1090)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to