[ 
https://issues.apache.org/jira/browse/FLINK-12890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Zagrebin updated FLINK-12890:
------------------------------------
    Description: 
At the moment we have ShuffleEnvironment.releasePartitions which is used to 
release locally occupied resources of partition. JM can also use it by calling 
TaskExecutorGateway.releasePartitions.

To support lifecycle management of partitions (FLINK-12069, relevant mostly for 
batch and blocking partitions), we need to extend Shuffle API:
 * ShuffleDescriptor.hasLocalResources() indicates that this partition occupies 
local resources on TM and requires TM running to consume the produced data 
(e.g. true for default NettyShuffleEnviroment and false for externally stored 
partitions). If a partition needs external lifecycle management and is not 
released after the first consumption is done 
(ResultPartitionDeploymentDescriptor.isReleasedOnConsumption()), then RM/JM 
should keep TMs, which produce these partitions, running until partition still 
needs to be consumed. The connection to these TMs should also to be kept to 
issue the RPC call TaskExecutorGateway.releasePartitions once partition is not 
needed any more.

 * ShuffleMaster.removePartitionExternally(): JM should call this whenever the 
partition does not need to be consumed any more. This call releases partition 
resources possibly occupied externally outside of TM and does not depend on 
ShuffleDescriptor.hasLocalResources.

  was:
At the moment we have ShuffleEnvironment.releasePartitions which is used to 
release locally occupied resources of partition. JM can also use it by calling 
TaskExecutorGateway.releasePartitions.

To support lifecycle management of partitions (FLINK-12069, relevant mostly for 
batch and blocking partitions), we need to extend Shuffle API:
 * ShuffleDescriptor.hasLocalResources() indicates that this partition occupies 
local resources on TM and requires TM running to consume the produced data 
(e.g. true for default NettyShuffleEnviroment and false for externally stored 
partitions). If a partition needs external lifecycle management and is not 
released after the first consumption is done 
(ResultPartitionDeploymentDescriptor.isReleasedOnConsumption()), then RM/JM 
should keep TMs, which produce these partitions, running until partition still 
needs to be consumed. The connection to these TMs should also to be kept to 
issue the RPC call TaskExecutorGateway.releasePartitions once partition is not 
needed any more.

 * ShuffleMaster.removePartitionExternally(): JM should call this whenever the 
partition does not need to be consumed any more. This call releases partition 
resources possibly occupied externally outside of TM and should not depend on 
ShuffleDescriptor.hasLocalResources.


> Add partition lifecycle related Shuffle API
> -------------------------------------------
>
>                 Key: FLINK-12890
>                 URL: https://issues.apache.org/jira/browse/FLINK-12890
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Coordination
>            Reporter: Andrey Zagrebin
>            Assignee: Andrey Zagrebin
>            Priority: Major
>             Fix For: 1.9.0
>
>
> At the moment we have ShuffleEnvironment.releasePartitions which is used to 
> release locally occupied resources of partition. JM can also use it by 
> calling TaskExecutorGateway.releasePartitions.
> To support lifecycle management of partitions (FLINK-12069, relevant mostly 
> for batch and blocking partitions), we need to extend Shuffle API:
>  * ShuffleDescriptor.hasLocalResources() indicates that this partition 
> occupies local resources on TM and requires TM running to consume the 
> produced data (e.g. true for default NettyShuffleEnviroment and false for 
> externally stored partitions). If a partition needs external lifecycle 
> management and is not released after the first consumption is done 
> (ResultPartitionDeploymentDescriptor.isReleasedOnConsumption()), then RM/JM 
> should keep TMs, which produce these partitions, running until partition 
> still needs to be consumed. The connection to these TMs should also to be 
> kept to issue the RPC call TaskExecutorGateway.releasePartitions once 
> partition is not needed any more.
>  * ShuffleMaster.removePartitionExternally(): JM should call this whenever 
> the partition does not need to be consumed any more. This call releases 
> partition resources possibly occupied externally outside of TM and does not 
> depend on ShuffleDescriptor.hasLocalResources.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to