Dear all, Unfortunately I've not got ant respond in users forum. That's why I decided to publish this question here. We encountered problems of failed jobs with huge amount of data. For example, an application works perfectly with relative small sized data, but when it grows in 2 times this application fails.
A simple local test was prepared for this question at https://gist.github.com/copy-of-rezo/6a137e13a1e4f841e7eb It generates 2 sets of key-value pairs, join them, selects distinct values and counts data finally. object Spill { def generate = { for{ j <- 1 to 10 i <- 1 to 200 } yield(j, i) } def main(args: Array[String]) { val conf = new SparkConf().setAppName(getClass.getSimpleName) conf.set("spark.shuffle.spill", "true") conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") val sc = new SparkContext(conf) println(generate) val dataA = sc.parallelize(generate) val dataB = sc.parallelize(generate) val dst = dataA.join(dataB).distinct().count() println(dst) } } We compiled it locally and run 3 times with different settings of memory: 1) --executor-memory 10M --driver-memory 10M --num-executors 1 --executor-cores 1 It fails wtih "java.lang.OutOfMemoryError: GC overhead limit exceeded" at ..... org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:137) 2) --executor-memory 20M --driver-memory 20M --num-executors 1 --executor-cores 1 It works OK 3) --executor-memory 10M --driver-memory 10M --num-executors 1 --executor-cores 1 But let's make less data for i from 200 to 100. It reduces input data in 2 times and joined data in 4 times def generate = { for{ j <- 1 to 10 i <- 1 to 100 // previous value was 200 } yield(j, i) } This code works OK. We don't understand why 10M is not enough for such simple operation with 32000 bytes of ints (2 * 10 * 200 * 2 * 4) approximately? 10M of RAM works if we change the data volume in 2 times (2000 of records of (int, int)). Why spilling to disk doesn't cover this case? -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-OutOfMemoryError-at-simple-local-test-tp9490.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org