[
https://issues.apache.org/jira/browse/FLINK-15300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrey Zagrebin closed FLINK-15300.
-----------------------------------
Resolution: Fixed
merged into master by d22fdc39a86496ebfc74914a72916d8a0ea7ab89
merged into 1.10 by a342e418a2d8df52645dd75588f8b9f74a07ad63
> Shuffle memory fraction sanity check does not account for its min/max limit
> ---------------------------------------------------------------------------
>
> Key: FLINK-15300
> URL: https://issues.apache.org/jira/browse/FLINK-15300
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Configuration
> Reporter: Andrey Zagrebin
> Assignee: Andrey Zagrebin
> Priority: Critical
> Labels: pull-request-available
> Fix For: 1.10.0
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> If we have a configuration which results in setting shuffle memory size to
> its min or max, not fraction during TM startup then starting TM parses
> generated dynamic properties and while doing the sanity check
> (TaskExecutorResourceUtils#sanityCheckShuffleMemory) it fails because it
> checks the exact fraction for min/max value.
> Example, start TM with the following Flink config:
> {code:java}
> taskmanager.memory.total-flink.size: 350m
> taskmanager.memory.framework.heap.size: 16m
> taskmanager.memory.shuffle.fraction: 0.1{code}
> The calculation will happen for total Flink memory and will result in the
> following extra program args:
> {code:java}
> taskmanager.memory.shuffle.max: 67108864b
> taskmanager.memory.framework.off-heap.size: 134217728b
> taskmanager.memory.managed.size: 146800642b
> taskmanager.cpu.cores: 1.0
> taskmanager.memory.task.heap.size: 2097150b
> taskmanager.memory.task.off-heap.size: 0b
> taskmanager.memory.shuffle.min: 67108864b{code}
> where the derived fraction is less than shuffle memory min size (64mb), so it
> was set to the min value: 64mb.
> While TM starts, the calculation happens now for the explicit task heap and
> managed memory but also with the explicit total Flink memory and
> TaskExecutorResourceUtils#sanityCheckShuffleMemory throws the following
> exception:
> {code:java}
> org.apache.flink.configuration.IllegalConfigurationException:
> Derived Shuffle Memory size(64 Mb (67108864 bytes)) does not match configured
> Shuffle Memory fraction (0.10000000149011612).
> at
> org.apache.flink.runtime.clusterframework.TaskExecutorResourceUtils.sanityCheckShuffleMemory(TaskExecutorResourceUtils.java:552)
> at
> org.apache.flink.runtime.clusterframework.TaskExecutorResourceUtils.deriveResourceSpecWithExplicitTaskAndManagedMemory(TaskExecutorResourceUtils.java:183)
> at
> org.apache.flink.runtime.clusterframework.TaskExecutorResourceUtils.resourceSpecFromConfig(TaskExecutorResourceUtils.java:135)
> {code}
> This can be fixed by checking whether the fraction to assert is within the
> min/max range.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)