[ 
https://issues.apache.org/jira/browse/FLINK-33961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17801669#comment-17801669
 ] 

Weijie Guo edited comment on FLINK-33961 at 1/2/24 6:38 AM:
------------------------------------------------------------

[~Jiang Xin] Thanks for reporting this. Unfortunately, this issue can be 
considered somewhat by design. In order to avoid additional overhead, we allow 
the upstream not to calculate the exact backlog, which would result in 
{{exclusive-buffers-per-channel}} not being set to 0. It can greatly affect 
performance and may even block the job.

Therefore, we have the following instructions in the documentation: When the 
legacy Hybrid shuffle mode is used, decreasing the number of exclusive buffers 
per channel will seriously affect the performance. Therefore, this value should 
not be set to 0. 

That's one of the reasons we introduced the new hybrid shuffle mode(i.e. 
TieredStorage Shuffle). If there are no further questions, I will close the 
issue then.


was (Author: weijie guo):
Thanks for reporting this. Unfortunately, this issue can be considered somewhat 
by design. In order to avoid additional overhead, we allow the upstream not to 
calculate the exact backlog, which would result in 
{{exclusive-buffers-per-channel}} not being set to 0. It can greatly affect 
performance and may even block the job.

Therefore, we have the following instructions in the documentation: When the 
legacy Hybrid shuffle mode is used, decreasing the number of exclusive buffers 
per channel will seriously affect the performance. Therefore, this value should 
not be set to 0. 

That's one of the reasons we introduced the new hybrid shuffle mode(i.e. 
TieredStorage Shuffle). If there are no further questions, I will close the 
issue then.

> Hybrid Shuffle may hang when exclusive buffers per channel is set to 0
> ----------------------------------------------------------------------
>
>                 Key: FLINK-33961
>                 URL: https://issues.apache.org/jira/browse/FLINK-33961
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Network
>            Reporter: Jiang Xin
>            Priority: Major
>
> I found that the Hybrid Shuffle without enabling new mode may hang when 
> exclusive-buffers-per-channel is set to 0. It can be reproduced by adding the 
> following test into `HybridShuffleITCase.java` and running it.
> {code:java}
> @RepeatedTest(10)
> void testHybridFullExchangesWithNonBuffersPerChannel() throws Exception {
>     final int numRecordsToSend = 10000;
>     Configuration configuration = configureHybridOptions(getConfiguration(), 
> false);
>     configuration.set(
>             
> NettyShuffleEnvironmentOptions.NETWORK_HYBRID_SHUFFLE_ENABLE_NEW_MODE, false);
>     configuration.set(NETWORK_BUFFERS_PER_CHANNEL, 0);
>     JobGraph jobGraph = createJobGraph(numRecordsToSend, false, 
> configuration);
>     executeJob(jobGraph, configuration, numRecordsToSend);
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to