I am running local (client). My vm is 16 cpu/108gb ram. My configuration is as following:
spark.executor.extraJavaOptions -XX:+PrintGCDetails -XX:+UseCompressedOops -XX:+UseParallelGC -XX:+UseParallelOldGC -XX:+DisableExplicitGC -XX:MaxPermSize=1024m spark.daemon.memory=20g spark.driver.memory=20g spark.executor.memory=20g export SPARK_DAEMON_JAVA_OPTS="-XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UseCompressedOops -XX:+UseParallelGC -XX:+UseParallelOldGC -XX:+DisableExplicitGC -XX:MaxPermSize=1024m" /usr/local/spark-1.1.0/bin/spark-submit --class main.java.MyAppMainProcess --master local[32] MyApp.jar >> myapp.out /11/03 20:45:43 INFO BlockManager: Removing block broadcast_4 14/11/03 20:45:43 INFO MemoryStore: Block broadcast_4 of size 3872 dropped from memory (free 16669590422) 14/11/03 20:45:43 INFO ContextCleaner: Cleaned broadcast 4 14/11/03 20:46:00 WARN BlockManager: Putting block rdd_19_5 failed 14/11/03 20:46:00 ERROR Executor: Exception in task 5.0 in stage 3.0 (TID 70) java.lang.OutOfMemoryError: Requested array size exceeds VM limit at java.util.Arrays.copyOf(Arrays.java:2271) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:113) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:140) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126) at java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1876) at java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1785) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1188) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:347) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:42) at org.apache.spark.serializer.SerializationStream.writeAll(Serializer.scala:110) at org.apache.spark.storage.BlockManager.dataSerializeStream(BlockManager.scala:1047) at org.apache.spark.storage.BlockManager.dataSerialize(BlockManager.scala:1056) at org.apache.spark.storage.TachyonStore.putIterator(TachyonStore.scala:60) at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:743) at org.apache.spark.storage.BlockManager.putIterator(BlockManager.scala:594) at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:145) at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70) at org.apache.spark.rdd.RDD.iterator(RDD.scala:227) at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745 It is hard to see from this output what stage it fails, but the output is saving textFile. Individual record (key, value or key and value is relatively small, but number of records in the collection is large.) There seems to be a bottleneck that I have run into that I can't seem to get pass. Any pointers in the right direction will be helpful! Thanks, Ami -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/OOM-Requested-array-size-exceeds-VM-limit-tp17996.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org