[jira] [Commented] (SPARK-15060) Fix stack overflow when executing long lineage transform without checkpoint

Zheng Tan (JIRA) Mon, 02 May 2016 08:41:26 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-15060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15266832#comment-15266832
 ]


Zheng Tan commented on SPARK-15060:
-----------------------------------

1) We can serialize rdd to binary without dependency info(save dependency out 
side of rdd and set rdd's dependencies field to null temporarily ), This 
approach would eliminate stack overflow problem. 
2)  Checkpoint can deal with most of problems caused by deep lineages including 
this one. But I think we should also give users choice whether to checkpoint 
their rdds or not when execute code as the above example.  User may be not 
happy to dump their data to HDFS in some cases.  

> Fix stack overflow when executing long lineage transform without checkpoint
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-15060
>                 URL: https://issues.apache.org/jira/browse/SPARK-15060
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 1.5.2, 1.6.1, 2.0.0
>            Reporter: Zheng Tan
>
> When executing long linage rdd transform, it is easy to get stack overflow 
> exception in driver end. This can be reproduced by the following example:
> var rdd = sc.makeRDD(1 to 10, 10)
> for (_ <- 1 to 1000) {
>   rdd = rdd.map(x => x)
> }
> rdd.reduce(_ + _)
> SPARK-5955 solve this problem by checkpointing rdd for every 10~20 rounds. It 
> is not so convenient since it required checkpointing data to HDFS. 
> Another solution is cutting off the recursive rdd dependencies in driver end 
> and re-assembly them in executor end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-15060) Fix stack overflow when executing long lineage transform without checkpoint

Reply via email to