[jira] [Commented] (FLINK-32201) Enable the distribution of shuffle descriptors via the blob server by connection number

Lijie Wang (Jira) Sat, 01 Jul 2023 03:21:05 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-32201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17739293#comment-17739293
 ]


Lijie Wang commented on FLINK-32201:
------------------------------------

Done via:
master(1.18): 4fe3560015cd9cc076afad470228a9565d557935
 

> Enable the distribution of shuffle descriptors via the blob server by 
> connection number
> ---------------------------------------------------------------------------------------
>
>                 Key: FLINK-32201
>                 URL: https://issues.apache.org/jira/browse/FLINK-32201
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Runtime / Coordination
>            Reporter: Weihua Hu
>            Assignee: Weihua Hu
>            Priority: Major
>              Labels: pull-request-available
>
> Flink support distributes shuffle descriptors via the blob server to reduce 
> JobManager overhead. But the default threshold to enable it is 1MB, which 
> never reaches. Users need to set a proper value for this, but it requires 
> advanced knowledge before configuring it.
> I would like to enable this feature by the number of connections of a group 
> of shuffle descriptors. For examples, a simple streaming job with two 
> operators, each with 10,000 parallelism and connected via all-to-all 
> distribution. In this job, we only get one set of shuffle descriptors, and 
> this group has 10000 * 10000 connections. This means that JobManager needs to 
> send this set of shuffle descriptors to 10000 tasks.
> Since it is also difficult for users to configure, I would like to give it a 
> default value. The serialized shuffle descriptors sizes for different 
> parallelism are shown below.
> || Producer parallelism || serialized shuffle descriptor size || consumer 
> parallelism || total data size that JM needs to send ||
> | 5000 | 100KB | 5000 | 500MB |
> | 10000 | 200KB | 10000 | 2GB |
> | 20000 | 400Kb | 20000 | 8GB |
> So, I would like to set the default value to 10,000 * 10,000. 
> Any suggestions or concerns are appreciated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (FLINK-32201) Enable the distribution of shuffle descriptors via the blob server by connection number

Reply via email to