[
https://issues.apache.org/jira/browse/FLINK-19761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17220452#comment-17220452
]
Xuannan Su commented on FLINK-19761:
------------------------------------
You are right that not returning all the information it needs to consume the
result partition will weaken the functionality when sharing cluster partition
across clusters. In that case, I agree to send the shuffle descriptor back to
the client. However, I feel like it is still missing something without the
look-up method. e.g., If the user indeed using an external shuffle service, and
some partition is promoted so that it remains after the job is finished. In
that case, the external shuffle service should not rely on the client to keep
the information to consume the partition.
> Add lookup method for registered ShuffleDescriptor in ShuffleMaster
> -------------------------------------------------------------------
>
> Key: FLINK-19761
> URL: https://issues.apache.org/jira/browse/FLINK-19761
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Network
> Reporter: Xuannan Su
> Priority: Major
>
> Currently, the ShuffleMaster can register a partition and get the shuffle
> descriptor. However, it lacks the ability to look up the registered
> ShuffleDescriptors belongs to an IntermediateResult by the
> IntermediateDataSetID.
> Adding the lookup method to the ShuffleMaster can make reusing the cluster
> partition more easily. For example, we don't have to return the
> ShuffleDescriptor to the client just so that the other job can somehow encode
> the ShuffleDescriptor in the JobGraph to consume the cluster partition.
> Instead, we only need to return the IntermediateDatSetID and use it to lookup
> the ShuffleDescriptor by another job.
> By adding the lookup method in ShuffleMaster, if we have an external shuffle
> service and the lifecycle of the IntermediateResult is not bounded to the
> cluster, we can look up the ShuffleDescriptor and reuse the
> IntermediateResult by a job running on another cluster even if the cluster
> that produced the IntermediateResult is shutdown.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)