Joseph K. Bradley created SPARK-6071:
----------------------------------------

             Summary: ALS doc example fails randomly in PythonAccumulatorParam
                 Key: SPARK-6071
                 URL: https://issues.apache.org/jira/browse/SPARK-6071
             Project: Spark
          Issue Type: Bug
          Components: MLlib, PySpark
    Affects Versions: 1.3.0
            Reporter: Joseph K. Bradley
            Priority: Minor


When running the ALS example in 
[http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html#examples]
 on branch-1.3, I got a random failure which I have been unable to reproduce.

Specifically, I was running on the branch from this PR 
[https://github.com/apache/spark/pull/4811] at this commit: 
[https://github.com/mengxr/spark/commit/06140a48ec5bd55b329e9b7cf658bd3e43be4fe2]

However, that PR should not have affected the bug, so I suspect it is within 
branch-1.3 itself.

After a clean build, I ran:
{code}
from pyspark.mllib.recommendation import ALS, Rating, MatrixFactorizationModel

# Load and parse the data
data = sc.textFile("data/mllib/als/test.data")
ratings = data.map(lambda l: l.split(',')).map(lambda l: Rating(int(l[0]), 
int(l[1]), float(l[2])))

# Build the recommendation model using Alternating Least Squares
rank = 10
numIterations = 20
model = ALS.train(ratings, rank, numIterations)
{code}

And I got this error:
{code}
>>> model = ALS.train(ratings, rank, numIterations)
15/02/27 14:41:24 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
15/02/27 14:41:24 WARN LoadSnappy: Snappy native library not loaded
15/02/27 14:41:26 WARN BLAS: Failed to load implementation from: 
com.github.fommil.netlib.NativeSystemBLAS
15/02/27 14:41:26 WARN BLAS: Failed to load implementation from: 
com.github.fommil.netlib.NativeRefBLAS
15/02/27 14:41:26 WARN LAPACK: Failed to load implementation from: 
com.github.fommil.netlib.NativeSystemLAPACK
15/02/27 14:41:26 WARN LAPACK: Failed to load implementation from: 
com.github.fommil.netlib.NativeRefLAPACK
15/02/27 14:41:29 ERROR DAGScheduler: Failed to update accumulators for 
ResultTask(279, 2)
java.lang.ClassCastException: scala.None$ cannot be cast to java.util.List
        at 
org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:745)
        at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:82)
        at 
org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:340)
        at 
org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:335)
        at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
        at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
        at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
        at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
        at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
        at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
        at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
        at org.apache.spark.Accumulators$.add(Accumulators.scala:335)
        at 
org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:892)
        at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:974)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1398)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
15/02/27 14:41:29 ERROR DAGScheduler: Failed to update accumulators for 
ResultTask(279, 4)
java.lang.ClassCastException: scala.None$ cannot be cast to java.util.List
        at 
org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:745)
        at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:82)
        at 
org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:340)
        at 
org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:335)
        at 
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
        at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
        at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
        at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
        at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
        at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
        at 
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
        at org.apache.spark.Accumulators$.add(Accumulators.scala:335)
        at 
org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:892)
        at 
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:974)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1398)
        at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
{code}

However, re-running the same train() call immediately worked, and I have not 
yet been able to reproduce the bug.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to