We made some changes in code (it generates 1000 * 1000 elements) and memory limits up to 100M:
def generate = { for{ j <- 1 to 10 i <- 1 to 1000 } yield(j, i) } ~/soft/spark-1.1.0-bin-hadoop2.3/bin/spark-submit --master local --executor-memory 100M --driver-memory 100M --class Spill --num-executors 1 --executor-cores 1 target/scala-2.10/Spill-assembly-1.0.jar The result of this: 14/11/24 14:57:40 ERROR ExecutorUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-0,5,main] java.lang.OutOfMemoryError: GC overhead limit exceeded We decided to check this one by profiler and took this screenshot: <http://apache-spark-developers-list.1001551.n3.nabble.com/file/n9532/%D0%A1%D0%BA%D1%80%D0%B8%D0%BD%D1%88%D0%BE%D1%82_2014-11-26_11.png> Each element of collection takes 48 bytes. Each element = scala.Tuple2 of 2 java.lang.Integer. But Scala supports "@specialized" <https://github.com/scala/scala/blob/v2.10.4/src/library/scala/Tuple2.scala#L19> unboxed primitive type of Int. Which takes 4 bytes only. So from this point of view this collection would take about 1000 * 1000 * 2 * 4 = 8 Mb + some overheads. This number in 5 times less then current result of memory consumpution. Why Spark didn't use primitive (@specialized) types in this case? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-OutOfMemoryError-at-simple-local-test-tp9490p9532.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org