Andrey Zagrebin created FLINK-12890:
---------------------------------------
Summary: Add partition lifecycle related Shuffle API
Key: FLINK-12890
URL: https://issues.apache.org/jira/browse/FLINK-12890
Project: Flink
Issue Type: Sub-task
Components: Runtime / Coordination
Reporter: Andrey Zagrebin
Assignee: Andrey Zagrebin
Fix For: 1.9.0
At the moment we have ShuffleEnvironment.releasePartitions which is used to
release locally occupied resources of partition. JM can also use it by calling
TaskExecutorGateway.releasePartitions.
To support lifecycle management of partitions (FLINK-12069, relevant mostly for
batch and blocking partitions), we need to extend Shuffle API:
* ShuffleDescriptor.hasLocalResources() indicates that this partition occupies
local resources on TM and requires TM running to consume the produced data
(e.g. true for default NettyShuffleEnviroment and false for externally stored
partitions). If a partition needs external lifecycle management and is not
released after the first consumption is done
(ResultPartitionDeploymentDescriptor.isReleasedOnConsumption()), then RM/JM
should keep TMs, which produce these partitions, running until partition still
needs to be consumed. The connection to these TMs should also to be kept to
issue the RPC call TaskExecutorGateway.releasePartitions once partition is not
needed any more.
* ShuffleMaster.removePartitionExternally(): JM should call this whenever the
partition doe not need to be consumed any more. This call releases partition
resources possibly occupied externally outside of TM and should not depend on
ShuffleDescriptor.hasLocalResources.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)