[jira] [Commented] (SPARK-1006) MLlib ALS gets stack overflow with too many iterations

Alger Remirata (JIRA) Wed, 06 Jan 2016 09:20:13 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085862#comment-15085862
 ]


Alger Remirata commented on SPARK-1006:
---------------------------------------

First of all, I would like to thank you guys for developing spark and putting 
it open source that we can use. I'm new to Spark and Scala, and working in a 
project involving matrix factorizations in Spark. I have a problem regarding 
running ALS in Spark. It has a stackoverflow due to long linage chain as per 
comments on the internet. One of their suggestion is to use the 
setCheckpointInterval so that for every 10-20 iterations, we can checkpoint the 
RDDs and it prevents the error. Just want to ask details on how to do 
checkpointing with ALS. I am using spark-kernel developed by IBM: 
https://github.com/ibm-et/spark-kernel instead of spark-shell.

Here are some of my specific questions regarding details on checkpoint:

1. In setting checkpoint directory through SparkContext.setCheckPointDir(), it 
needs to be a hadoop compatible directory. Can we use any available 
hdfs-compatible directory?
2. What do you mean by this comment on the code in ALS checkpointing:
If the checkpoint directory is not set in [[org.apache.spark.SparkContext]],
  * this setting is ignored.
3. Is the use of setCheckPointInterval the only code I needed to add to have 
checkpointing for ALS work?
4. I am getting this error: Name: java.lang.IllegalArgumentException, Message: 
Wrong FS: expected file :///. How can I solve this? What is the proper way of 
using checkpointing.

Thanks a lot!


> MLlib ALS gets stack overflow with too many iterations
> ------------------------------------------------------
>
>                 Key: SPARK-1006
>                 URL: https://issues.apache.org/jira/browse/SPARK-1006
>             Project: Spark
>          Issue Type: Bug
>          Components: MLlib
>            Reporter: Matei Zaharia
>
> The tipping point seems to be around 50. We should fix this by checkpointing 
> the RDDs every 10-20 iterations to break the lineage chain, but checkpointing 
> currently requires HDFS installed, which not all users will have.
> We might also be able to fix DAGScheduler to not be recursive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-1006) MLlib ALS gets stack overflow with too many iterations

Reply via email to