[ 
https://issues.apache.org/jira/browse/FLINK-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16559773#comment-16559773
 ] 

Piotr Nowojski commented on FLINK-9969:
---------------------------------------

Another remark. This might also have something to do with how Flip6 lazily 
requests new task managers. In {{mode: old}} tasks are uniformly distributed 
among all nodes, while in Flip6, tasks are squeezed mostly into one node. Time 
line of events is as follow:

12:49:13 - requested new TaskManager

12:49:20 - first TaskManager registered
12:49:20 all tasks moved from `CREATED` to `SCHEDULED`, they start being 
executed in batches of 4 (because of 4 slots) on this TM

12:49:20 - requested more TaskManagers

12:49:21 first TM after processing ~18 tasks fails with OOM

12:49:27 - more TM registered

Please check attached [^yarn_logs]  for details.

 

> Unreasonable memory requirements to complete examples/batch/WordCount
> ---------------------------------------------------------------------
>
>                 Key: FLINK-9969
>                 URL: https://issues.apache.org/jira/browse/FLINK-9969
>             Project: Flink
>          Issue Type: Bug
>          Components: ResourceManager
>    Affects Versions: 1.5.0, 1.5.1, 1.5.2, 1.6.0
>            Reporter: Piotr Nowojski
>            Priority: Blocker
>             Fix For: 1.5.2, 1.6.0
>
>         Attachments: yarn_logs
>
>
> setup on AWS EMR:
>  * 5 worker nodes (m4.4xlarge nodes) 
>  * 1 master node (m4.large)
> following command fails with out of memory errors:
> {noformat}
> export HADOOP_CLASSPATH=`hadoop classpath`
> ./bin/flink run -m yarn-cluster -p 20 -yn 5 -ys 4 -ytm 16000 
> examples/batch/WordCount.jar{noformat}
> Only increasing memory over 17.2GB example completes. At the same time after 
> disabling flip6 following command succeeds:
> {noformat}
> export HADOOP_CLASSPATH=`hadoop classpath`
> ./bin/flink run -m yarn-cluster -p 20 -yn 5 -ys 4 -ytm 1000 
> examples/batch/WordCount.jar{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to