[
https://issues.apache.org/jira/browse/FLINK-6742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16062282#comment-16062282
]
ASF GitHub Bot commented on FLINK-6742:
---------------------------------------
Github user gyfora commented on a diff in the pull request:
https://github.com/apache/flink/pull/4083#discussion_r123895607
--- Diff:
flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/savepoint/SavepointV2.java
---
@@ -168,10 +168,27 @@ public static Savepoint
convertToOperatorStateSavepointV2(
expandedToLegacyIds = true;
}
+ if (jobVertex == null) {
+ throw new IllegalStateException(
+ "Could not find task for state with ID
" + taskState.getJobVertexID() + ". " +
+ "When migrating a savepoint from a
version < 1.3 please make sure that the topology was not " +
+ "changed through removal of a stateful
operator or modification of a chain containing a stateful " +
+ "operator.");
+ }
+
List<OperatorID> operatorIDs =
jobVertex.getOperatorIDs();
for (int subtaskIndex = 0; subtaskIndex <
jobVertex.getParallelism(); subtaskIndex++) {
- SubtaskState subtaskState =
taskState.getState(subtaskIndex);
+ SubtaskState subtaskState;
+ try {
+ subtaskState =
taskState.getState(subtaskIndex);
--- End diff --
Sorry for commenting late on this but I have had some major migration
issues in the last few days :D
I think we should explicitly compare parallelism instead of relying on the
error:
if (taskState.getStates().size() != jobVertex.getParallelism()) --> error
Otherwise this will not fail on lower parallelism.
> Improve error message when savepoint migration fails due to task removal
> ------------------------------------------------------------------------
>
> Key: FLINK-6742
> URL: https://issues.apache.org/jira/browse/FLINK-6742
> Project: Flink
> Issue Type: Bug
> Components: State Backends, Checkpointing
> Affects Versions: 1.3.0
> Reporter: Gyula Fora
> Assignee: Chesnay Schepler
> Priority: Minor
> Labels: flink-rel-1.3.1-blockers
>
> Caused by: java.lang.NullPointerException
> at
> org.apache.flink.runtime.checkpoint.savepoint.SavepointV2.convertToOperatorStateSavepointV2(SavepointV2.java:171)
> at
> org.apache.flink.runtime.checkpoint.savepoint.SavepointLoader.loadAndValidateSavepoint(SavepointLoader.java:75)
> at
> org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreSavepoint(CheckpointCoordinator.java:1090)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)