[
https://issues.apache.org/jira/browse/FLINK-24578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Anton Kalashnikov updated FLINK-24578:
--------------------------------------
Description:
given:
The job with 5 maps(with keyBy).
All channels are remote. Parallelism is 80
The first task produces only two keys - `indexOfThisSubtask` and
`indexOfThisSubtask + 1`. So every subTask has a constant value of active
channels(depends on hash rebalance)
Every record has an equal size and is processed for an equal time.
when:
The buffer debloat is enabled with the default configuration.
then:
The buffer size synchonizes on every subTask on the first map for some reason.
It can have the strong synchronization as shown on the erraticBufferSize1
picture but usually synchronization is less explicit as on erraticBufferSize2.
!erraticBufferSize1.png!
!erraticBufferSize2.png!
Expected:
After the stabilization period the buffer size should be mostly constant with
small fluctuation or the different tasks should be in antiphase to each
other(when one subtask has small buffer size the another should have a big
buffer size). for example the picture antiphaseBufferSize
!antiphaseBufferSize.png!
Unfortunatelly, it is not reproduced every time which means that this problem
can be connected to environment. But at least, it makes sense to try to
understand why we have so strange load shape when only several input channels
are active.
was:
given:
The job with 5 maps(with keyBy).
All channels are remote. Parallelism is 80
The first task produces only two keys - `indexOfThisSubtask` and
`indexOfThisSubtask + 1`. So every subTask has a constant value of active
channels(depends on hash rebalance)
Every record has an equal size and is processed for an equal time.
when:
The buffer debloat is enabled with the default configuration.
then:
The buffer size synchonizes on every subTask on the first map for some reason.
It can have the strong synchronization as shown on the erraticBufferSize1
picture but usually synchronization is less explicit as on erraticBufferSize2.
!erraticBufferSize1.png!
Expected:
After the stabilization period the buffer size should be mostly constant with
small fluctuation or the different tasks should be in antiphase to each
other(when one subtask has small buffer size the another should have a big
buffer size). for example the picture antiphaseBufferSize
!antiphaseBufferSize.png!
Unfortunatelly, it is not reproduced every time which means that this problem
can be connected to environment. But at least, it makes sense to try to
understand why we have so strange load shape when only several input channels
are active.
> Unexpected erratic load shape for channel skew load profile
> -----------------------------------------------------------
>
> Key: FLINK-24578
> URL: https://issues.apache.org/jira/browse/FLINK-24578
> Project: Flink
> Issue Type: Sub-task
> Components: Runtime / Checkpointing
> Affects Versions: 1.14.0
> Reporter: Anton Kalashnikov
> Priority: Major
> Attachments: antiphaseBufferSize.png, erraticBufferSize1.png,
> erraticBufferSize2.png
>
>
> given:
> The job with 5 maps(with keyBy).
> All channels are remote. Parallelism is 80
> The first task produces only two keys - `indexOfThisSubtask` and
> `indexOfThisSubtask + 1`. So every subTask has a constant value of active
> channels(depends on hash rebalance)
> Every record has an equal size and is processed for an equal time.
>
> when:
> The buffer debloat is enabled with the default configuration.
>
> then:
> The buffer size synchonizes on every subTask on the first map for some
> reason. It can have the strong synchronization as shown on the
> erraticBufferSize1 picture but usually synchronization is less explicit as on
> erraticBufferSize2.
> !erraticBufferSize1.png!
> !erraticBufferSize2.png!
>
> Expected:
> After the stabilization period the buffer size should be mostly constant with
> small fluctuation or the different tasks should be in antiphase to each
> other(when one subtask has small buffer size the another should have a big
> buffer size). for example the picture antiphaseBufferSize
> !antiphaseBufferSize.png!
>
> Unfortunatelly, it is not reproduced every time which means that this problem
> can be connected to environment. But at least, it makes sense to try to
> understand why we have so strange load shape when only several input channels
> are active.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)