[ 
https://issues.apache.org/jira/browse/FLINK-15300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrey Zagrebin closed FLINK-15300.
-----------------------------------
    Resolution: Fixed

merged into master by d22fdc39a86496ebfc74914a72916d8a0ea7ab89
merged into 1.10 by a342e418a2d8df52645dd75588f8b9f74a07ad63

> Shuffle memory fraction sanity check does not account for its min/max limit
> ---------------------------------------------------------------------------
>
>                 Key: FLINK-15300
>                 URL: https://issues.apache.org/jira/browse/FLINK-15300
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / Configuration
>            Reporter: Andrey Zagrebin
>            Assignee: Andrey Zagrebin
>            Priority: Critical
>              Labels: pull-request-available
>             Fix For: 1.10.0
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> If we have a configuration which results in setting shuffle memory size to 
> its min or max, not fraction during TM startup then starting TM parses 
> generated dynamic properties and while doing the sanity check 
> (TaskExecutorResourceUtils#sanityCheckShuffleMemory) it fails because it 
> checks the exact fraction for min/max value.
> Example, start TM with the following Flink config:
> {code:java}
> taskmanager.memory.total-flink.size: 350m
> taskmanager.memory.framework.heap.size: 16m
> taskmanager.memory.shuffle.fraction: 0.1{code}
> The calculation will happen for total Flink memory and will result in the 
> following extra program args:
> {code:java}
> taskmanager.memory.shuffle.max: 67108864b
> taskmanager.memory.framework.off-heap.size: 134217728b
> taskmanager.memory.managed.size: 146800642b
> taskmanager.cpu.cores: 1.0
> taskmanager.memory.task.heap.size: 2097150b
> taskmanager.memory.task.off-heap.size: 0b
> taskmanager.memory.shuffle.min: 67108864b{code}
> where the derived fraction is less than shuffle memory min size (64mb), so it 
> was set to the min value: 64mb.
> While TM starts, the calculation happens now for the explicit task heap and 
> managed memory but also with the explicit total Flink memory and 
> TaskExecutorResourceUtils#sanityCheckShuffleMemory throws the following 
> exception:
> {code:java}
> org.apache.flink.configuration.IllegalConfigurationException:
> Derived Shuffle Memory size(64 Mb (67108864 bytes)) does not match configured 
> Shuffle Memory fraction (0.10000000149011612).
> at 
> org.apache.flink.runtime.clusterframework.TaskExecutorResourceUtils.sanityCheckShuffleMemory(TaskExecutorResourceUtils.java:552)
> at 
> org.apache.flink.runtime.clusterframework.TaskExecutorResourceUtils.deriveResourceSpecWithExplicitTaskAndManagedMemory(TaskExecutorResourceUtils.java:183)
> at 
> org.apache.flink.runtime.clusterframework.TaskExecutorResourceUtils.resourceSpecFromConfig(TaskExecutorResourceUtils.java:135)
> {code}
> This can be fixed by checking whether the fraction to assert is within the 
> min/max range.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to