azagrebin commented on a change in pull request #9250:
[FLINK-13371][coordination] Prevent leaks of blocking partitions
URL: https://github.com/apache/flink/pull/9250#discussion_r308678007
##########
File path:
flink-runtime/src/main/java/org/apache/flink/runtime/executiongraph/Execution.java
##########
@@ -1066,6 +1066,8 @@ void markFinished(Map<String, Accumulator<?, ?>>
userAccumulators, IOMetrics met
else if (current == CANCELING) {
// we sent a cancel call, and the task manager
finished before it arrived. We
// will never get a CANCELED call back from the
job manager
+ // release all partitions because partitions
should only be kept if the execution reaches FINISHED
+
sendReleaseIntermediateResultPartitionsRpcCall();
Review comment:
what about other following "not properly finished" branches? no need for
release calls there?
Also, this is a bit misleading:
```
At this point the PartitionTracker is not yet tracking these partitions
(since we never officially reached a state FINISHED in the EG), hence the
execution is sending these through separate RPC logic.
```
From what I see, we start tracking while the execution is being deployed in
`Execution#registerProducedPartitions` and this is why we do:
```
Additionally, the execution no longer issues release calls through the
PartitionTracker if it reached a terminal state, but just removes the
partitions from the tracker.
```
but it does not need release in case of normally confirmed cancelation by
Task which does the release internally (maybe simplify and always send it as
before?).
At the same time, this change partially addresses:
```
Note that a similar issue can occur for pipelined partitions that are
buffered in the producers side before a consumer was actually scheduled.
```
because RPCs are sent for all partitions. Since this will not be needed once
task state is coupled with consumer confirmation for the pipelined, I would do
this `sendReleaseIntermediateResultPartitionsRpcCall` only for pipelined and
use partition tracker "removeWithRelease" for blocking.
Also, jira issue title/description should be adjusted then if we do not
address here the previous cancel/suspend of finished partitions.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services