[ 
https://issues.apache.org/jira/browse/MAHOUT-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15903315#comment-15903315
 ] 

Pat Ferrel commented on MAHOUT-1951:
------------------------------------

scratch that PR. We do not have a fix for this but I have narrowed down the 
commit where it first starts to occur.

in Mahout 0.12.2 the drivers work with remote Spark

They work in all commits until 
https://github.com/apache/mahout/commit/8e0e8b5572e0d24c1930ed60fec6d02693b41575
 which would say that something in this commit broke things. This is mainly 
Flink but there is a change to how mahout jars are packaged and the error is 
shown below. The error wording is a bit mysterious, it seem to be missing 
MahoutKryoRegistrator but could also be from a class that cannot be serialized, 
really not sure.

Exception in thread "main" 17/03/06 18:15:04 INFO TaskSchedulerImpl: Removed 
TaskSet 0.0, whose tasks have all completed, from pool 
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in 
stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 
6, 192.168.0.6): java.io.IOException: org.apache.spark.SparkException: Failed 
to register classes with Kryo
    at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1212)
    at 
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:165)
    at 
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
    at 
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
    at 
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:88)
    at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.spark.SparkException: Failed to register classes with Kryo
    at 
org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:128)
    at 
org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:273)
    at 
org.apache.spark.serializer.KryoSerializerInstance.<init>(KryoSerializer.scala:258)
    at 
org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:174)
    at 
org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:215)
    at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:178)
    at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1205)
    ... 11 more
Caused by: java.lang.ClassNotFoundException: 
org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at 
org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$5.apply(KryoSerializer.scala:123)
    at 
org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$5.apply(KryoSerializer.scala:123)
    at scala.Option.map(Option.scala:145)
    at 
org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:123)
    ... 17 more
Driver stacktrace:
    at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
    at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
    at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
    at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at 
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
    at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
    at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
    at scala.Option.foreach(Option.scala:236)
    at 
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
    at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
    at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
    at 
org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
    at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
    at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:927)
    at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
    at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
    at org.apache.spark.rdd.RDD.collect(RDD.scala:926)
    at 
org.apache.mahout.drivers.TDIndexedDatasetReader$class.elementReader(TextDelimitedReaderWriter.scala:81)
    at 
org.apache.mahout.drivers.TextDelimitedIndexedDatasetReader.elementReader(TextDelimitedReaderWriter.scala:319)
    at 
org.apache.mahout.drivers.TextDelimitedIndexedDatasetReader.elementReader(TextDelimitedReaderWriter.scala:319)
    at 
org.apache.mahout.math.indexeddataset.Reader$class.readElementsFrom(ReaderWriter.scala:75)
    at 
org.apache.mahout.drivers.TextDelimitedIndexedDatasetReader.readElementsFrom(TextDelimitedReaderWriter.scala:319)
    at 
org.apache.mahout.sparkbindings.SparkEngine$.indexedDatasetDFSReadElements(SparkEngine.scala:382)
    at 
org.apache.mahout.sparkbindings.SparkEngine$.indexedDatasetDFSReadElements(SparkEngine.scala:39)
    at 
org.apache.mahout.math.indexeddataset.package$.indexedDatasetDFSReadElements(package.scala:372)
    at 
org.apache.mahout.drivers.ItemSimilarityDriver$.readIndexedDatasets(ItemSimilarityDriver.scala:152)
    at 
org.apache.mahout.drivers.ItemSimilarityDriver$.process(ItemSimilarityDriver.scala:201)
    at 
org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:112)
    at 
org.apache.mahout.drivers.ItemSimilarityDriver$$anonfun$main$1.apply(ItemSimilarityDriver.scala:110)
    at scala.Option.map(Option.scala:145)
    at 
org.apache.mahout.drivers.ItemSimilarityDriver$.main(ItemSimilarityDriver.scala:110)
    at 
org.apache.mahout.drivers.ItemSimilarityDriver.main(ItemSimilarityDriver.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at 
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.IOException: org.apache.spark.SparkException: Failed to 
register classes with Kryo
    at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1212)
    at 
org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:165)
    at 
org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
    at 
org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
    at 
org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:88)
    at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.spark.SparkException: Failed to register classes with Kryo
    at 
org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:128)
    at 
org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:273)
    at 
org.apache.spark.serializer.KryoSerializerInstance.<init>(KryoSerializer.scala:258)
    at 
org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:174)
    at 
org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:215)
    at 
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:178)
    at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1205)
    ... 11 more
Caused by: java.lang.ClassNotFoundException: 
org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at 
org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$5.apply(KryoSerializer.scala:123)
    at 
org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo$5.apply(KryoSerializer.scala:123)
    at scala.Option.map(Option.scala:145)
    at 
org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:123)
    ... 17 more


> Drivers don't run with remote Spark
> -----------------------------------
>
>                 Key: MAHOUT-1951
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1951
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification, CLI, Collaborative Filtering
>    Affects Versions: 0.13.0
>         Environment: The command line drivers spark-itemsimilarity and 
> spark-naivebayes using a remote or pseudo-clustered Spark
>            Reporter: Pat Ferrel
>            Assignee: Pat Ferrel
>            Priority: Blocker
>             Fix For: 0.13.0
>
>
> Missing classes when running these jobs because the dependencies-reduced jar, 
> passed to Spark for serialization purposes, does not contain all needed 
> classes.
> Found by a user. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to