[
https://issues.apache.org/jira/browse/FLINK-33961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17801669#comment-17801669
]
Weijie Guo edited comment on FLINK-33961 at 1/2/24 6:39 AM:
------------------------------------------------------------
[~Jiang Xin] Thanks for reporting this. Unfortunately, this issue can be
considered somewhat by design. In order to avoid additional overhead, we allow
the upstream not to calculate the exact backlog, which would result in
{{exclusive-buffers-per-channel}} not being set to 0. It can greatly affect
performance and may even block the job.
Therefore, we have the following instructions in the documentation: When the
legacy Hybrid shuffle mode is used, decreasing the number of exclusive buffers
per channel will seriously affect the performance. Therefore, this value should
not be set to 0.
That's one of the reasons we introduced the new hybrid shuffle mode(i.e.
TieredStorage Shuffle). If there are no further questions, I will close this
issue then.
was (Author: weijie guo):
[~Jiang Xin] Thanks for reporting this. Unfortunately, this issue can be
considered somewhat by design. In order to avoid additional overhead, we allow
the upstream not to calculate the exact backlog, which would result in
{{exclusive-buffers-per-channel}} not being set to 0. It can greatly affect
performance and may even block the job.
Therefore, we have the following instructions in the documentation: When the
legacy Hybrid shuffle mode is used, decreasing the number of exclusive buffers
per channel will seriously affect the performance. Therefore, this value should
not be set to 0.
That's one of the reasons we introduced the new hybrid shuffle mode(i.e.
TieredStorage Shuffle). If there are no further questions, I will close the
issue then.
> Hybrid Shuffle may hang when exclusive buffers per channel is set to 0
> ----------------------------------------------------------------------
>
> Key: FLINK-33961
> URL: https://issues.apache.org/jira/browse/FLINK-33961
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Network
> Reporter: Jiang Xin
> Priority: Major
>
> I found that the Hybrid Shuffle without enabling new mode may hang when
> exclusive-buffers-per-channel is set to 0. It can be reproduced by adding the
> following test into `HybridShuffleITCase.java` and running it.
> {code:java}
> @RepeatedTest(10)
> void testHybridFullExchangesWithNonBuffersPerChannel() throws Exception {
> final int numRecordsToSend = 10000;
> Configuration configuration = configureHybridOptions(getConfiguration(),
> false);
> configuration.set(
>
> NettyShuffleEnvironmentOptions.NETWORK_HYBRID_SHUFFLE_ENABLE_NEW_MODE, false);
> configuration.set(NETWORK_BUFFERS_PER_CHANNEL, 0);
> JobGraph jobGraph = createJobGraph(numRecordsToSend, false,
> configuration);
> executeJob(jobGraph, configuration, numRecordsToSend);
> } {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)