Joseph K. Bradley created SPARK-6071:
----------------------------------------
Summary: ALS doc example fails randomly in PythonAccumulatorParam
Key: SPARK-6071
URL: https://issues.apache.org/jira/browse/SPARK-6071
Project: Spark
Issue Type: Bug
Components: MLlib, PySpark
Affects Versions: 1.3.0
Reporter: Joseph K. Bradley
Priority: Minor
When running the ALS example in
[http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html#examples]
on branch-1.3, I got a random failure which I have been unable to reproduce.
Specifically, I was running on the branch from this PR
[https://github.com/apache/spark/pull/4811] at this commit:
[https://github.com/mengxr/spark/commit/06140a48ec5bd55b329e9b7cf658bd3e43be4fe2]
However, that PR should not have affected the bug, so I suspect it is within
branch-1.3 itself.
After a clean build, I ran:
{code}
from pyspark.mllib.recommendation import ALS, Rating, MatrixFactorizationModel
# Load and parse the data
data = sc.textFile("data/mllib/als/test.data")
ratings = data.map(lambda l: l.split(',')).map(lambda l: Rating(int(l[0]),
int(l[1]), float(l[2])))
# Build the recommendation model using Alternating Least Squares
rank = 10
numIterations = 20
model = ALS.train(ratings, rank, numIterations)
{code}
And I got this error:
{code}
>>> model = ALS.train(ratings, rank, numIterations)
15/02/27 14:41:24 WARN NativeCodeLoader: Unable to load native-hadoop library
for your platform... using builtin-java classes where applicable
15/02/27 14:41:24 WARN LoadSnappy: Snappy native library not loaded
15/02/27 14:41:26 WARN BLAS: Failed to load implementation from:
com.github.fommil.netlib.NativeSystemBLAS
15/02/27 14:41:26 WARN BLAS: Failed to load implementation from:
com.github.fommil.netlib.NativeRefBLAS
15/02/27 14:41:26 WARN LAPACK: Failed to load implementation from:
com.github.fommil.netlib.NativeSystemLAPACK
15/02/27 14:41:26 WARN LAPACK: Failed to load implementation from:
com.github.fommil.netlib.NativeRefLAPACK
15/02/27 14:41:29 ERROR DAGScheduler: Failed to update accumulators for
ResultTask(279, 2)
java.lang.ClassCastException: scala.None$ cannot be cast to java.util.List
at
org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:745)
at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:82)
at
org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:340)
at
org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:335)
at
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
at
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at org.apache.spark.Accumulators$.add(Accumulators.scala:335)
at
org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:892)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:974)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1398)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
15/02/27 14:41:29 ERROR DAGScheduler: Failed to update accumulators for
ResultTask(279, 4)
java.lang.ClassCastException: scala.None$ cannot be cast to java.util.List
at
org.apache.spark.api.python.PythonAccumulatorParam.addInPlace(PythonRDD.scala:745)
at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:82)
at
org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:340)
at
org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:335)
at
scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
at
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
at
scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
at org.apache.spark.Accumulators$.add(Accumulators.scala:335)
at
org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:892)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:974)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1398)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
{code}
However, re-running the same train() call immediately worked, and I have not
yet been able to reproduce the bug.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]