[
https://issues.apache.org/jira/browse/FLINK-32201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17739293#comment-17739293
]
Lijie Wang commented on FLINK-32201:
------------------------------------
Done via:
master(1.18): 4fe3560015cd9cc076afad470228a9565d557935
> Enable the distribution of shuffle descriptors via the blob server by
> connection number
> ---------------------------------------------------------------------------------------
>
> Key: FLINK-32201
> URL: https://issues.apache.org/jira/browse/FLINK-32201
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Coordination
> Reporter: Weihua Hu
> Assignee: Weihua Hu
> Priority: Major
> Labels: pull-request-available
>
> Flink support distributes shuffle descriptors via the blob server to reduce
> JobManager overhead. But the default threshold to enable it is 1MB, which
> never reaches. Users need to set a proper value for this, but it requires
> advanced knowledge before configuring it.
> I would like to enable this feature by the number of connections of a group
> of shuffle descriptors. For examples, a simple streaming job with two
> operators, each with 10,000 parallelism and connected via all-to-all
> distribution. In this job, we only get one set of shuffle descriptors, and
> this group has 10000 * 10000 connections. This means that JobManager needs to
> send this set of shuffle descriptors to 10000 tasks.
> Since it is also difficult for users to configure, I would like to give it a
> default value. The serialized shuffle descriptors sizes for different
> parallelism are shown below.
> || Producer parallelism || serialized shuffle descriptor size || consumer
> parallelism || total data size that JM needs to send ||
> | 5000 | 100KB | 5000 | 500MB |
> | 10000 | 200KB | 10000 | 2GB |
> | 20000 | 400Kb | 20000 | 8GB |
> So, I would like to set the default value to 10,000 * 10,000.
> Any suggestions or concerns are appreciated.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)