dawidwys commented on a change in pull request #16655: URL: https://github.com/apache/flink/pull/16655#discussion_r683172235
########## File path: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/FinishedOperatorSubtaskState.java ########## @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.runtime.checkpoint; + +/** + * A specialized {@link OperatorSubtaskState} used to mark the finished subtasks in the snapshot of + * this operator. + */ +public class FinishedOperatorSubtaskState extends OperatorSubtaskState { + + private static final long serialVersionUID = 7206415348825695023L; Review comment: Just use `1L`. Please take a look at our coding guidelines for explanation. ########## File path: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/metadata/MetadataV3Serializer.java ########## @@ -114,8 +115,14 @@ protected void serializeOperatorState(OperatorState operatorState, DataOutputStr operatorState.getSubtaskStates(); dos.writeInt(subtaskStateMap.size()); for (Map.Entry<Integer, OperatorSubtaskState> entry : subtaskStateMap.entrySet()) { - dos.writeInt(entry.getKey()); - serializeSubtaskState(entry.getValue(), dos); + if (entry.getValue().isFinished()) { + // We store a negative index for the finished subtask. In consideration + // of the index 0, the negative index would start from -1. + dos.writeInt(-(entry.getKey() + 1)); Review comment: That's a bit too much magic for my taste :( It kind of worked for `Operator` where it meant the number of enclosed states. Here I find too complex especially with the offsetting logic. Unfortunately, I think we need to adjust the metadata version. ########## File path: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/FinishedOperatorSubtaskState.java ########## @@ -0,0 +1,52 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.runtime.checkpoint; + +/** + * A specialized {@link OperatorSubtaskState} used to mark the finished subtasks in the snapshot of + * this operator. + */ +public class FinishedOperatorSubtaskState extends OperatorSubtaskState { Review comment: Do we need a separate class? Could we just add a flag to the current class? ########## File path: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/PendingCheckpoint.java ########## @@ -347,6 +347,7 @@ public CompletedCheckpoint finalizeCheckpoint( } fulfillFullyFinishedOperatorStates(); + fulfillSubtaskStateForPartlyFinishedOperators(); Review comment: `Partly` -> `Partially` ########## File path: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/VertexFinishedStateChecker.java ########## @@ -0,0 +1,216 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.runtime.checkpoint; + +import org.apache.flink.annotation.VisibleForTesting; +import org.apache.flink.runtime.OperatorIDPair; +import org.apache.flink.runtime.executiongraph.ExecutionJobVertex; +import org.apache.flink.runtime.executiongraph.IntermediateResult; +import org.apache.flink.runtime.jobgraph.DistributionPattern; +import org.apache.flink.runtime.jobgraph.JobEdge; +import org.apache.flink.runtime.jobgraph.JobVertexID; +import org.apache.flink.runtime.jobgraph.OperatorID; +import org.apache.flink.util.FlinkRuntimeException; + +import java.util.HashMap; +import java.util.Map; +import java.util.Optional; +import java.util.Set; +import java.util.stream.Collectors; + +/** + * This class encapsulates the operation that checks if there are illegal modification to the + * JobGraph when restoring from a checkpoint with partly or fully finished operator states. + * + * <p>As a whole, it ensures + * + * <ol> + * <li>All the operators inside a JobVertex have the same finished state. + * <li>The predecessors of a fully finished vertex must also be fully finished. + * <li>The processors of a partly finished vertex + * <ul> + * <li>If connected via ALL_TO_ALL edge, the predecessor must be fully finished. + * <li>If connected via POINTWISE edge, the predecessor must be partly finished or fully + * finished. + * </ul> + * </ol> + */ +public class VertexFinishedStateChecker { + + private final Set<ExecutionJobVertex> vertices; + + private final Map<OperatorID, OperatorState> operatorStates; + + public VertexFinishedStateChecker( + Set<ExecutionJobVertex> vertices, Map<OperatorID, OperatorState> operatorStates) { + this.vertices = vertices; + this.operatorStates = operatorStates; + } + + public void validateOperatorsFinishedState() { + VerticesFinishedStatusCache verticesFinishedCache = + new VerticesFinishedStatusCache(operatorStates); + for (ExecutionJobVertex vertex : vertices) { + VertexFinishedState vertexFinishedState = verticesFinishedCache.getOrUpdate(vertex); + + if (vertexFinishedState == VertexFinishedState.FULLY_FINISHED) { + checkProcessorsOfFullyFinishedVertex(vertex, verticesFinishedCache); Review comment: Could you check the naming? What does the `processors` mean? ########## File path: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/VertexFinishedStateChecker.java ########## @@ -0,0 +1,216 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.runtime.checkpoint; + +import org.apache.flink.annotation.VisibleForTesting; +import org.apache.flink.runtime.OperatorIDPair; +import org.apache.flink.runtime.executiongraph.ExecutionJobVertex; +import org.apache.flink.runtime.executiongraph.IntermediateResult; +import org.apache.flink.runtime.jobgraph.DistributionPattern; +import org.apache.flink.runtime.jobgraph.JobEdge; +import org.apache.flink.runtime.jobgraph.JobVertexID; +import org.apache.flink.runtime.jobgraph.OperatorID; +import org.apache.flink.util.FlinkRuntimeException; + +import java.util.HashMap; +import java.util.Map; +import java.util.Optional; +import java.util.Set; +import java.util.stream.Collectors; + +/** + * This class encapsulates the operation that checks if there are illegal modification to the + * JobGraph when restoring from a checkpoint with partly or fully finished operator states. + * + * <p>As a whole, it ensures + * + * <ol> + * <li>All the operators inside a JobVertex have the same finished state. + * <li>The predecessors of a fully finished vertex must also be fully finished. + * <li>The processors of a partly finished vertex + * <ul> + * <li>If connected via ALL_TO_ALL edge, the predecessor must be fully finished. + * <li>If connected via POINTWISE edge, the predecessor must be partly finished or fully + * finished. + * </ul> + * </ol> + */ +public class VertexFinishedStateChecker { + + private final Set<ExecutionJobVertex> vertices; + + private final Map<OperatorID, OperatorState> operatorStates; + + public VertexFinishedStateChecker( + Set<ExecutionJobVertex> vertices, Map<OperatorID, OperatorState> operatorStates) { + this.vertices = vertices; + this.operatorStates = operatorStates; + } + + public void validateOperatorsFinishedState() { + VerticesFinishedStatusCache verticesFinishedCache = + new VerticesFinishedStatusCache(operatorStates); + for (ExecutionJobVertex vertex : vertices) { + VertexFinishedState vertexFinishedState = verticesFinishedCache.getOrUpdate(vertex); + + if (vertexFinishedState == VertexFinishedState.FULLY_FINISHED) { + checkProcessorsOfFullyFinishedVertex(vertex, verticesFinishedCache); Review comment: Could you check the naming? What does the `processors` mean? Did you mean `predecessors`? ########## File path: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/VertexFinishedStateChecker.java ########## @@ -0,0 +1,216 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.runtime.checkpoint; + +import org.apache.flink.annotation.VisibleForTesting; +import org.apache.flink.runtime.OperatorIDPair; +import org.apache.flink.runtime.executiongraph.ExecutionJobVertex; +import org.apache.flink.runtime.executiongraph.IntermediateResult; +import org.apache.flink.runtime.jobgraph.DistributionPattern; +import org.apache.flink.runtime.jobgraph.JobEdge; +import org.apache.flink.runtime.jobgraph.JobVertexID; +import org.apache.flink.runtime.jobgraph.OperatorID; +import org.apache.flink.util.FlinkRuntimeException; + +import java.util.HashMap; +import java.util.Map; +import java.util.Optional; +import java.util.Set; +import java.util.stream.Collectors; + +/** + * This class encapsulates the operation that checks if there are illegal modification to the + * JobGraph when restoring from a checkpoint with partly or fully finished operator states. + * + * <p>As a whole, it ensures + * + * <ol> + * <li>All the operators inside a JobVertex have the same finished state. + * <li>The predecessors of a fully finished vertex must also be fully finished. + * <li>The processors of a partly finished vertex + * <ul> + * <li>If connected via ALL_TO_ALL edge, the predecessor must be fully finished. + * <li>If connected via POINTWISE edge, the predecessor must be partly finished or fully + * finished. + * </ul> + * </ol> + */ +public class VertexFinishedStateChecker { + + private final Set<ExecutionJobVertex> vertices; + + private final Map<OperatorID, OperatorState> operatorStates; + + public VertexFinishedStateChecker( + Set<ExecutionJobVertex> vertices, Map<OperatorID, OperatorState> operatorStates) { + this.vertices = vertices; + this.operatorStates = operatorStates; + } + + public void validateOperatorsFinishedState() { + VerticesFinishedStatusCache verticesFinishedCache = + new VerticesFinishedStatusCache(operatorStates); + for (ExecutionJobVertex vertex : vertices) { + VertexFinishedState vertexFinishedState = verticesFinishedCache.getOrUpdate(vertex); + + if (vertexFinishedState == VertexFinishedState.FULLY_FINISHED) { + checkProcessorsOfFullyFinishedVertex(vertex, verticesFinishedCache); + } else if (vertexFinishedState == VertexFinishedState.PARTLY_FINISHED) { + checkProcessorsOfPartlyFinishedVertex(vertex, verticesFinishedCache); + } + } + } + + private void checkProcessorsOfFullyFinishedVertex( + ExecutionJobVertex vertex, VerticesFinishedStatusCache verticesFinishedStatusCache) { + boolean allPredecessorsFinished = + vertex.getInputs().stream() + .map(IntermediateResult::getProducer) + .allMatch( + jobVertex -> + verticesFinishedStatusCache.getOrUpdate(jobVertex) + == VertexFinishedState.FULLY_FINISHED); + + if (!allPredecessorsFinished) { + throw new FlinkRuntimeException( + "Illegal JobGraph modification. Cannot run a program with fully finished" + + " vertices predeceased with the ones not fully finished. Task vertex " + + vertex.getName() + + "(" + + vertex.getJobVertexId() + + ")" + + " has a predecessor not fully finished"); + } + } + + private void checkProcessorsOfPartlyFinishedVertex( + ExecutionJobVertex vertex, VerticesFinishedStatusCache verticesFinishedStatusCache) { + // Computes the distribution pattern from the predecessors. If there are multiple edges, + // ALL_TO_ALL edges would have a higher priority. + Map<JobVertexID, DistributionPattern> predecessorDistribution = new HashMap<>(); + for (JobEdge jobEdge : vertex.getJobVertex().getInputs()) { + predecessorDistribution.compute( + jobEdge.getSource().getProducer().getID(), + (k, v) -> + v == DistributionPattern.ALL_TO_ALL + ? v + : jobEdge.getDistributionPattern()); + } + + for (IntermediateResult dataset : vertex.getInputs()) { + ExecutionJobVertex predecessor = dataset.getProducer(); + VertexFinishedState predecessorState = + verticesFinishedStatusCache.getOrUpdate(predecessor); + DistributionPattern distribution = + predecessorDistribution.get(predecessor.getJobVertexId()); + + if (distribution == DistributionPattern.ALL_TO_ALL + && predecessorState != VertexFinishedState.FULLY_FINISHED) { + throw new FlinkRuntimeException( + "Illegal JobGraph modification. Cannot run a program with partly finished" + + " vertices predeceased with running or partly finished ones and" + + " connected via the ALL_TO_ALL edges. Task vertex " + + vertex.getName() + + "(" + + vertex.getJobVertexId() + + ")" + + " has a " + + (predecessorState == VertexFinishedState.ALL_RUNNING + ? "all running" + : "partly finished") + + " predecessor"); + } else if (distribution == DistributionPattern.POINTWISE + && predecessorState == VertexFinishedState.ALL_RUNNING) { + throw new FlinkRuntimeException( + "Illegal JobGraph modification. Cannot run a program with partly finished" + + " vertices predeceased with all running ones. Task vertex " + + vertex.getName() + + "(" + + vertex.getJobVertexId() + + ")" + + " has a all running predecessor"); + } + } + } + + @VisibleForTesting + enum VertexFinishedState { + ALL_RUNNING, + PARTLY_FINISHED, + FULLY_FINISHED + } + + private static class VerticesFinishedStatusCache { + private final Map<OperatorID, OperatorState> operatorStates; + private final Map<JobVertexID, VertexFinishedState> finishedCache = new HashMap<>(); + + private VerticesFinishedStatusCache(Map<OperatorID, OperatorState> operatorStates) { + this.operatorStates = operatorStates; + } + + public VertexFinishedState getOrUpdate(ExecutionJobVertex vertex) { + return finishedCache.computeIfAbsent( + vertex.getJobVertexId(), + ignored -> calculateFinishedState(vertex, operatorStates)); + } + + private VertexFinishedState calculateFinishedState( + ExecutionJobVertex vertex, Map<OperatorID, OperatorState> operatorStates) { + Set<VertexFinishedState> operatorFinishedStates = + vertex.getOperatorIDs().stream() + .map(idPair -> checkOperatorFinishedStatus(operatorStates, idPair)) + .collect(Collectors.toSet()); + if (operatorFinishedStates.size() > 1) { Review comment: nit: `!= 1`? ########## File path: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/metadata/MetadataV3Serializer.java ########## @@ -114,8 +115,14 @@ protected void serializeOperatorState(OperatorState operatorState, DataOutputStr operatorState.getSubtaskStates(); dos.writeInt(subtaskStateMap.size()); for (Map.Entry<Integer, OperatorSubtaskState> entry : subtaskStateMap.entrySet()) { - dos.writeInt(entry.getKey()); - serializeSubtaskState(entry.getValue(), dos); + if (entry.getValue().isFinished()) { + // We store a negative index for the finished subtask. In consideration + // of the index 0, the negative index would start from -1. + dos.writeInt(-(entry.getKey() + 1)); Review comment: @pnowojski @StephanEwen What do you think about adjusting the metadata format to include the finished flag? ########## File path: flink-runtime/src/main/java/org/apache/flink/runtime/checkpoint/VertexFinishedStateChecker.java ########## @@ -0,0 +1,216 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.flink.runtime.checkpoint; + +import org.apache.flink.annotation.VisibleForTesting; +import org.apache.flink.runtime.OperatorIDPair; +import org.apache.flink.runtime.executiongraph.ExecutionJobVertex; +import org.apache.flink.runtime.executiongraph.IntermediateResult; +import org.apache.flink.runtime.jobgraph.DistributionPattern; +import org.apache.flink.runtime.jobgraph.JobEdge; +import org.apache.flink.runtime.jobgraph.JobVertexID; +import org.apache.flink.runtime.jobgraph.OperatorID; +import org.apache.flink.util.FlinkRuntimeException; + +import java.util.HashMap; +import java.util.Map; +import java.util.Optional; +import java.util.Set; +import java.util.stream.Collectors; + +/** + * This class encapsulates the operation that checks if there are illegal modification to the + * JobGraph when restoring from a checkpoint with partly or fully finished operator states. + * + * <p>As a whole, it ensures + * + * <ol> + * <li>All the operators inside a JobVertex have the same finished state. + * <li>The predecessors of a fully finished vertex must also be fully finished. + * <li>The processors of a partly finished vertex + * <ul> + * <li>If connected via ALL_TO_ALL edge, the predecessor must be fully finished. + * <li>If connected via POINTWISE edge, the predecessor must be partly finished or fully + * finished. + * </ul> + * </ol> + */ +public class VertexFinishedStateChecker { + + private final Set<ExecutionJobVertex> vertices; + + private final Map<OperatorID, OperatorState> operatorStates; + + public VertexFinishedStateChecker( + Set<ExecutionJobVertex> vertices, Map<OperatorID, OperatorState> operatorStates) { + this.vertices = vertices; + this.operatorStates = operatorStates; + } + + public void validateOperatorsFinishedState() { + VerticesFinishedStatusCache verticesFinishedCache = + new VerticesFinishedStatusCache(operatorStates); + for (ExecutionJobVertex vertex : vertices) { + VertexFinishedState vertexFinishedState = verticesFinishedCache.getOrUpdate(vertex); + + if (vertexFinishedState == VertexFinishedState.FULLY_FINISHED) { + checkProcessorsOfFullyFinishedVertex(vertex, verticesFinishedCache); + } else if (vertexFinishedState == VertexFinishedState.PARTLY_FINISHED) { + checkProcessorsOfPartlyFinishedVertex(vertex, verticesFinishedCache); + } + } + } + + private void checkProcessorsOfFullyFinishedVertex( + ExecutionJobVertex vertex, VerticesFinishedStatusCache verticesFinishedStatusCache) { + boolean allPredecessorsFinished = + vertex.getInputs().stream() + .map(IntermediateResult::getProducer) + .allMatch( + jobVertex -> + verticesFinishedStatusCache.getOrUpdate(jobVertex) + == VertexFinishedState.FULLY_FINISHED); + + if (!allPredecessorsFinished) { + throw new FlinkRuntimeException( + "Illegal JobGraph modification. Cannot run a program with fully finished" + + " vertices predeceased with the ones not fully finished. Task vertex " + + vertex.getName() + + "(" + + vertex.getJobVertexId() + + ")" + + " has a predecessor not fully finished"); + } + } + + private void checkProcessorsOfPartlyFinishedVertex( + ExecutionJobVertex vertex, VerticesFinishedStatusCache verticesFinishedStatusCache) { + // Computes the distribution pattern from the predecessors. If there are multiple edges, + // ALL_TO_ALL edges would have a higher priority. + Map<JobVertexID, DistributionPattern> predecessorDistribution = new HashMap<>(); + for (JobEdge jobEdge : vertex.getJobVertex().getInputs()) { + predecessorDistribution.compute( + jobEdge.getSource().getProducer().getID(), + (k, v) -> + v == DistributionPattern.ALL_TO_ALL Review comment: Is it for cases when a vertex is connected twice to the current vertex? If so, could you add such a comment here? ########## File path: flink-runtime/src/test/java/org/apache/flink/runtime/checkpoint/CheckpointCoordinatorRestoringTest.java ########## @@ -1213,31 +1213,153 @@ public void testRestoringPartiallyFinishedChainsFails() throws Exception { + "anon(" + jobVertexID1 + ")" - + " which contain both finished and unfinished operators"); + + " which contain mixed operator finished state: [ALL_RUNNING, FULLY_FINISHED]"); coord.restoreLatestCheckpointedStateToAll(vertices, false); } @Test public void testAddingRunningOperatorBeforeFinishedOneFails() throws Exception { Review comment: Could we also have a test case for the situation that a single operator is connected via two different distribution patterns? That is quite an uncommon and very specific scenario that in my mind is worth checking. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
