Hi Yuichiro, The way to avoid this is to boost spark.yarn.executor.memoryOverhead until the executors have enough off-heap memory to avoid going over their limits.
-Sandy On Tue, Mar 24, 2015 at 11:49 AM, Yuichiro Sakamoto <ks...@muc.biglobe.ne.jp > wrote: > Hello. > > We use ALS(Collaborative filtering) of Spark MLlib on YARN. > Spark version is 1.2.0 included CDH 5.3.1. > > 1,000,000,000 records(5,000,000 users data and 5,000,000 items data) are > used for machine learning with ALS. > These large quantities of data increases virtual memory usage, > node manager of YARN kills Spark worker process. > Even though Spark run again after killing process, Spark worker process is > killed again. > As a result, the whole Spark processes are terminated. > > # Spark worker process is killed, it seems that virtual memory usage > increased by > # 'Shuffle' or 'Disk writing' gets over the threshold of YARN. > > To avoid such a case from occurring, we use the method that > 'yarn.nodemanager.vmem-check-enabled' is false, then exit successfully. > But it does not seem to have an appropriate way. > If you know, please let me know about tuning method of Spark. > > The conditions of machines and Spark settings are as follows. > 1)six machines, physical memory is 32GB of each machine. > 2)Spark settings > - spark.executor.memory=16g > - spark.closure.serializer=org.apache.spark.serializer.KryoSerializer > - spark.rdd.compress=true > - spark.shuffle.memoryFraction=0.4 > > Thanks, > Yuichiro Sakamoto > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/How-to-avoid-being-killed-by-YARN-node-manager-tp22199.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >