[ 
https://issues.apache.org/jira/browse/SPARK-10319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14717134#comment-14717134
 ] 

Sean Owen commented on SPARK-10319:
-----------------------------------

Definitely sounds like https://issues.apache.org/jira/browse/SPARK-5955 so 
either somehow the checkpoint interval isn't taking effect, or this is actually 
slightly different. If you scroll way way back, what's at the top of the stack? 
or is it truncated? Does it work with some number of iterations but not others? 
do you see evidence of checkpointing in the logs?

> ALS training using PySpark throws a StackOverflowError
> ------------------------------------------------------
>
>                 Key: SPARK-10319
>                 URL: https://issues.apache.org/jira/browse/SPARK-10319
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.4.1
>         Environment: Windows 10, spark - 1.4.1,
>            Reporter: Velu nambi
>
> When attempting to train a machine learning model using ALS in Spark's MLLib 
> (1.4) on windows, Pyspark always terminates with a StackoverflowError. I 
> tried adding the checkpoint as described in 
> http://stackoverflow.com/a/31484461/36130 -- doesn't seem to help.
> Here's the training code and stack trace:
> {code:none}
> ranks = [8, 12]
> lambdas = [0.1, 10.0]
> numIters = [10, 20]
> bestModel = None
> bestValidationRmse = float("inf")
> bestRank = 0
> bestLambda = -1.0
> bestNumIter = -1
> for rank, lmbda, numIter in itertools.product(ranks, lambdas, numIters):
>     ALS.checkpointInterval = 2
>     model = ALS.train(training, rank, numIter, lmbda)
>     validationRmse = computeRmse(model, validation, numValidation)
>     if (validationRmse < bestValidationRmse):
>          bestModel = model
>          bestValidationRmse = validationRmse
>          bestRank = rank
>          bestLambda = lmbda
>          bestNumIter = numIter
> testRmse = computeRmse(bestModel, test, numTest)
> {code}
> Stacktrace:
> 15/08/27 02:02:58 ERROR Executor: Exception in task 3.0 in stage 56.0 (TID 
> 127)
> java.lang.StackOverflowError
>     at java.io.ObjectInputStream$BlockDataInputStream.readInt(Unknown Source)
>     at java.io.ObjectInputStream.readHandle(Unknown Source)
>     at java.io.ObjectInputStream.readClassDesc(Unknown Source)
>     at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
>     at java.io.ObjectInputStream.readObject0(Unknown Source)
>     at java.io.ObjectInputStream.defaultReadFields(Unknown Source)
>     at java.io.ObjectInputStream.readSerialData(Unknown Source)
>     at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
>     at java.io.ObjectInputStream.readObject0(Unknown Source)
>     at java.io.ObjectInputStream.defaultReadFields(Unknown Source)
>     at java.io.ObjectInputStream.readSerialData(Unknown Source)
>     at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
>     at java.io.ObjectInputStream.readObject0(Unknown Source)
>     at java.io.ObjectInputStream.defaultReadFields(Unknown Source)
>     at java.io.ObjectInputStream.readSerialData(Unknown Source)
>     at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
>     at java.io.ObjectInputStream.readObject0(Unknown Source)
>     at java.io.ObjectInputStream.readObject(Unknown Source)
>     at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
>     at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
>     at java.lang.reflect.Method.invoke(Unknown Source)
>     at java.io.ObjectStreamClass.invokeReadObject(Unknown Source)
>     at java.io.ObjectInputStream.readSerialData(Unknown Source)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to