[
https://issues.apache.org/jira/browse/FLINK-32045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lijie Wang reassigned FLINK-32045:
----------------------------------
Assignee: Weihua Hu
> optimize task deployment performance for large-scale jobs
> ---------------------------------------------------------
>
> Key: FLINK-32045
> URL: https://issues.apache.org/jira/browse/FLINK-32045
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Coordination
> Reporter: Weihua Hu
> Assignee: Weihua Hu
> Priority: Major
>
> h1. Background
> In FLINK-21110, we cache shuffle descriptors on the job manager side and
> support using blob servers to offload these descriptors in order to reduce
> the cost of tasks deployment.
> I think there is also some improvement we could do for large-scale jobs.
> # The default min size to enable distribution via blob server is 1MB. But
> for a large wordcount job with 20000 parallelism, the size of serialized
> shuffle descriptors is only 300KB. It means users need to lower the
> "blob.offload.minsize", but the value is hard for users to decide.
> # The task executor side still needs to load blob files and deserialize
> shuffle descriptors for each task. Since these operations are running in the
> main thread, it may be pending other RPCs from the job manager.
> h1. Propose
> # Enable distribute shuffle descriptors via blob server automatically. This
> could be decided by the edge number of the current shuffle descriptor. The
> blob offload will be enabled when the edge number exceeds an internal
> threshold.
> # Introduce cache of deserialized shuffle descriptors on the task executor
> side. This could reduce the cost of reading from local blob files and
> deserialization. Of course, the cache should have TTL to avoid occupying too
> much memory. And the cache should have the same switch mechanism as the blob
> server offload.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)