How are you submitting/running the job - via spark-submit or as a plain old Java program?
If you are using spark-submit, you can control the memory setting via the configuration parameter spark.executor.memory in spark-defaults.conf. If you are running it as a Java program, use -Xmx to set the maximum heap size. On Thu, Feb 11, 2016 at 5:46 AM, Nirav Patel <npa...@xactlycorp.com> wrote: > In Yarn we have following settings enabled so that job can use virtual > memory to have a capacity beyond physical memory off course. > > <property> > <name>yarn.nodemanager.vmem-check-enabled</name> > <value>false</value> > </property> > > <property> > <name>yarn.nodemanager.pmem-check-enabled</name> > <value>false</value> > </property> > > vmem to pmem ration is 2:1. However spark doesn't seem to be able to > utilize this vmem limits > we are getting following heap space error which seemed to be contained > within spark executor. > > 16/02/09 23:08:06 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED > SIGNAL 15: SIGTERM > 16/02/09 23:08:06 ERROR executor.Executor: Exception in task 4.0 in stage > 7.6 (TID 22363) > java.lang.OutOfMemoryError: Java heap space > at java.util.IdentityHashMap.resize(IdentityHashMap.java:469) > at java.util.IdentityHashMap.put(IdentityHashMap.java:445) > at > org.apache.spark.util.SizeEstimator$SearchState.enqueue(SizeEstimator.scala:159) > at > org.apache.spark.util.SizeEstimator$$anonfun$visitSingleObject$1.apply(SizeEstimator.scala:203) > at > org.apache.spark.util.SizeEstimator$$anonfun$visitSingleObject$1.apply(SizeEstimator.scala:202) > at scala.collection.immutable.List.foreach(List.scala:318) > at > org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:202) > at > org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:186) > at org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:54) > at > org.apache.spark.util.collection.SizeTracker$class.takeSample(SizeTracker.scala:78) > at > org.apache.spark.util.collection.SizeTracker$class.afterUpdate(SizeTracker.scala:70) > at > org.apache.spark.util.collection.SizeTrackingVector.$plus$eq(SizeTrackingVector.scala:31) > at org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:278) > at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:171) > at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:262) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:264) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:744) > > > > Yarn resource manager doesn't give any indication that whether container > ran out of phycial or virtual memory limits. > > Also how to profile this container memory usage? We know our data is > skewed so some of the executor will have large data (~2M RDD objects) to > process. I used following as executorJavaOpts but it doesn't seem to work. > -XX:-HeapDumpOnOutOfMemoryError -XX:OnOutOfMemoryError='kill -3 %p' > -XX:HeapDumpPath=/opt/cores/spark > > > > > > > [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/> > > <https://www.nyse.com/quote/XNYS:XTLY> [image: LinkedIn] > <https://www.linkedin.com/company/xactly-corporation> [image: Twitter] > <https://twitter.com/Xactly> [image: Facebook] > <https://www.facebook.com/XactlyCorp> [image: YouTube] > <http://www.youtube.com/xactlycorporation>