Jin Xing created FLINK-22677:
--------------------------------
Summary: Scheduler should invoke
ShuffleMaster#registerPartitionWithProducer by a real asynchronous fashion
Key: FLINK-22677
URL: https://issues.apache.org/jira/browse/FLINK-22677
Project: Flink
Issue Type: Sub-task
Components: Runtime / Coordination
Reporter: Jin Xing
Current scheduler enforces a synchronous registration though the API of
ShuffleMaster#registerPartitionWithProducer returns a CompletableFuture. In
scenario of remote shuffle service, the talk between ShuffleMaster and remote
cluster tends to be expensive. A synchronous registration risks to block main
thread potentially and might cause negative side effects like heartbeat timeout.
Additionally, expensive synchronous invokes to remote could bottleneck the
throughput for applying shuffle resource, especially for batch jobs with
complicated DAGs;
--
This message was sent by Atlassian Jira
(v8.3.4#803005)