[
https://issues.apache.org/jira/browse/FLINK-24578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521540#comment-17521540
]
Piotr Nowojski commented on FLINK-24578:
----------------------------------------
As a next step in this ticket it might be a good idea to double check, if the
same performance regression as from enabling the debloating is visible after
manually decreasing the buffer size to a value similar as the debloated one for
the given job.
> Unexpected erratic load shape for channel skew load profile and ~10%
> performance loss with enabled debloating
> -------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-24578
> URL: https://issues.apache.org/jira/browse/FLINK-24578
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Checkpointing
> Affects Versions: 1.14.0
> Reporter: Anton Kalashnikov
> Priority: Major
> Attachments: antiphaseBufferSize.png, erraticBufferSize1.png,
> erraticBufferSize2.png
>
>
> given:
> The job with 5 maps(with keyBy).
> All channels are remote. Parallelism is 80
> The first task produces only two keys - `indexOfThisSubtask` and
> `indexOfThisSubtask + 1`. So every subTask has a constant value of active
> channels(depends on hash rebalance)
> Every record has an equal size and is processed for an equal time.
>
> when:
> The buffer debloat is enabled with the default configuration.
>
> then:
> The buffer size synchonizes on every subTask on the first map for some
> reason. It can have the strong synchronization as shown on the
> erraticBufferSize1 picture but usually synchronization is less explicit as on
> erraticBufferSize2.
> !erraticBufferSize1.png!
> !erraticBufferSize2.png!
>
> Expected:
> After the stabilization period the buffer size should be mostly constant with
> small fluctuation or the different tasks should be in antiphase to each
> other(when one subtask has small buffer size the another should have a big
> buffer size). for example the picture antiphaseBufferSize
> !antiphaseBufferSize.png!
>
> Unfortunatelly, it is not reproduced every time which means that this problem
> can be connected to environment. But at least, it makes sense to try to
> understand why we have so strange load shape when only several input channels
> are active.
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)