We made some changes in code (it generates 1000 * 1000 elements) and memory
limits up to 100M:

def generate = {
  for{
    j <- 1 to 10
    i <- 1 to 1000
  } yield(j, i)
}

~/soft/spark-1.1.0-bin-hadoop2.3/bin/spark-submit --master local
--executor-memory 100M --driver-memory 100M --class Spill --num-executors 1
--executor-cores 1 target/scala-2.10/Spill-assembly-1.0.jar

The result of this: 
14/11/24 14:57:40 ERROR ExecutorUncaughtExceptionHandler: Uncaught exception
in thread Thread[Executor task launch worker-0,5,main]
java.lang.OutOfMemoryError: GC overhead limit exceeded

We decided to check this one by profiler and took this screenshot:
<http://apache-spark-developers-list.1001551.n3.nabble.com/file/n9532/%D0%A1%D0%BA%D1%80%D0%B8%D0%BD%D1%88%D0%BE%D1%82_2014-11-26_11.png>
 

Each element of collection takes 48 bytes. Each element  = scala.Tuple2 of 2
java.lang.Integer.
But Scala  supports "@specialized"
<https://github.com/scala/scala/blob/v2.10.4/src/library/scala/Tuple2.scala#L19>
   
unboxed primitive type of Int. Which takes 4 bytes only.
So from this point of view this collection would take about 1000 * 1000 * 2
* 4 = 8 Mb + some overheads.
This number in 5 times less then current result of memory consumpution.
Why Spark didn't use primitive (@specialized) types in this case?





--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-OutOfMemoryError-at-simple-local-test-tp9490p9532.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to