[ 
https://issues.apache.org/jira/browse/SPARK-15060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15266601#comment-15266601
 ] 

Zheng Tan commented on SPARK-15060:
-----------------------------------

RDDs are linked by their dependencies: In the above example, RDD[1000] has 
OneToOneDependency on RDD[999], RDD[999] has OneToOneDependency on RDD[998] ... 
When driver wants to serialize RDD[1000], it need serialize RDD[999] and 
recursively serialize RDD[998] .... RDD[1]. These serialization is recursively 
and limited by java stack frame size. 

To avoid stack overflow, we can:
1. save rdd dependency relations as a external dependency map (id -> dep)
2. set each rdd dependency to null. After that we can safely call Java 
Serialization method on those rdds. 
3. broadcast dependency map and rdds binaries to executor
4. deserialize rdds and restore their rdd dependencies using this external map.

Before: RDD[1000] -> RDD[999] -> RDD[998] ... -> RDD[1]
After: RDD[1000], RDD[999], RDD[998], ... RDD[1] & (1000->999, 999->998, .... 
2->1)


> Fix stack overflow when executing long linage transform without checkpoint
> --------------------------------------------------------------------------
>
>                 Key: SPARK-15060
>                 URL: https://issues.apache.org/jira/browse/SPARK-15060
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 1.5.2, 1.6.1, 2.0.0
>            Reporter: Zheng Tan
>
> When executing long linage rdd transform, it is easy to get stack overflow 
> exception in driver end. This can be reproduced by the following example:
> var rdd = sc.makeRDD(1 to 10, 10)
> for (_ <- 1 to 1000) {
>   rdd = rdd.map(x => x)
> }
> rdd.reduce(_ + _)
> SPARK-5955 solve this problem by checkpointing rdd for every 10~20 rounds. It 
> is not so convenient since it required checkpointing data to HDFS. 
> Another solution is cutting off the recursive rdd dependencies in driver end 
> and re-assembly them in executor end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to