zentol opened a new pull request #9250: [FLINK-13371][coordination] Prevent leaks of blocking partitions URL: https://github.com/apache/flink/pull/9250 If force-release-on-consumption is enabled it is possible for blocking partitions to be leaked. For these partitions we rely on the consumer sending notifications for having consumed the partition, however the consumer may never be deployed successfully. In this case the partition is neither released by the task, nor by any other cleanup procedure since they all ignore partitions that are released on consumption. Note that a similar issue can occur for pipelined partitions that are buffered in the producers side before a consumer was actually scheduled. This issue is not addressed by this commit. The TaskExecutor now tracks all blocking partitions, to ensure they are cleaned up when the JM connection terminates or the TE shuts down. The PartitionTracker now tracks all blocking partitions, to ensure they are cleaned up on job termination and vertex resets. The execution now separately issues release calls for all produced partitions in case of a state reconciliation, where an execution was CANCELING but receives the notitification for being FINISHED. Since we arrive in a CANCELED state we release all partitions. At this point the PartitionTracker is not yet tracking these partitions (since we never officially reached a state FINISHED in the EG), hence the execution is sending these through separate RPC logic. Additionally, the execution no longer issues release calls through the PartitionTracker if it reached a terminal state, but just removes the partitions from the tracker.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
