[ https://issues.apache.org/jira/browse/SPARK-5499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-5499. ------------------------------ Resolution: Not a Problem To narrow this down, I tried: {code:scala} sc.setCheckpointDir("/tmp/checkpoint") var pair = sc.parallelize(Array((1L,2L))) for (i <- 1 to 1000) { pair.checkpoint() pair = pair.map(_.swap) } pair.count() {code} And it does overflow, but it's due to a serialization graph: {code} org.apache.spark.SparkException: Job aborted due to stage failure: Task serialization failed: java.lang.StackOverflowError java.io.ObjectStreamClass$FieldReflector.getPrimFieldValues(ObjectStreamClass.java:1930) java.io.ObjectStreamClass.getPrimFieldValues(ObjectStreamClass.java:1233) java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1533) java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) scala.collection.immutable.$colon$colon.writeObject(List.scala:379) sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source) sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) {code} However this loop body works: {code:scala} pair.checkpoint() pair.count() pair = pair.map(_.swap) {code} (Of course, you can call that every 10 or 100 iterations instead.) (Note that the call to {{checkpoint()}} should happen before other ops.) So, I think the issue is that RDDs are lazy of course, and {{checkpoint()}} only *marks* an RDD for persisting. To get it to do so, you have to invoke an operation on it. {{count()}} is cheap; the cheapest thing is {{foreachPartition(p => None)}}. (This is another argument for making a {{materialize()}} method, a la https://issues.apache.org/jira/browse/SPARK-6003) So, I'm resolving just because I'm pretty certain the behavior is by design and intended to be consistent with how {{persist()}} works. It does require this formulation above with an explicit request to 'materialize', and that could be easier. If anything the follow-on issue is; should persistence methods be eager? though that's a different question. > iterative computing with 1000 iterations causes stage failure > ------------------------------------------------------------- > > Key: SPARK-5499 > URL: https://issues.apache.org/jira/browse/SPARK-5499 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 1.2.0 > Reporter: Tien-Dung LE > > I got an error "org.apache.spark.SparkException: Job aborted due to stage > failure: Task serialization failed: java.lang.StackOverflowError" when > executing an action with 1000 transformations. > Here is a code snippet to re-produce the error: > {code} > import org.apache.spark.rdd.RDD > var pair: RDD[(Long,Long)] = sc.parallelize(Array((1L,2L))) > var newPair: RDD[(Long,Long)] = null > for (i <- 1 to 1000) { > newPair = pair.map(_.swap) > pair = newPair > } > println("Count = " + pair.count()) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org