[
https://issues.apache.org/jira/browse/FLINK-29339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17610344#comment-17610344
]
Yun Gao commented on FLINK-29339:
---------------------------------
Hi [~chesnay] , I'm a bit concern in that we mark this issue as blocker: the
issue itself should only affects the jobs that using cached result partition.
besides, in consideration of that the result is already cached in the
`JobMasterPartitionTrackerImpl`, the actual numbers of the rpc request is
limited.
But the fix might affect the jobs that do not use cached result partition, thus
I'm a bit concern on the risk that we modify the critical path of submitting
jobs at a moment closing to publishing.
I think we may postpone the fix after the publishing, till the start of the
next version, then we could have more time to ensure the fix does not bring
other issues. How do you think about it?
> JobMasterPartitionTrackerImpl#requestShuffleDescriptorsFromResourceManager
> blocks main thread
> ---------------------------------------------------------------------------------------------
>
> Key: FLINK-29339
> URL: https://issues.apache.org/jira/browse/FLINK-29339
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.16.0
> Reporter: Chesnay Schepler
> Assignee: Xuannan Su
> Priority: Blocker
> Labels: pull-request-available
> Fix For: 1.16.0
>
>
> {code:java}
> private List<ShuffleDescriptor> requestShuffleDescriptorsFromResourceManager(
> IntermediateDataSetID intermediateDataSetID) {
> Preconditions.checkNotNull(
> resourceManagerGateway, "JobMaster is not connected to
> ResourceManager");
> try {
> return this.resourceManagerGateway
> .getClusterPartitionsShuffleDescriptors(intermediateDataSetID)
> .get(); // <-- there's your problem
> } catch (Throwable e) {
> throw new RuntimeException(
> String.format(
> "Failed to get shuffle descriptors of intermediate
> dataset %s from ResourceManager",
> intermediateDataSetID),
> e);
> }
> }
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)