[ 
https://issues.apache.org/jira/browse/SPARK-8904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14619433#comment-14619433
 ] 

Joseph K. Bradley commented on SPARK-8904:
------------------------------------------

That looks like you may have run out of shuffle space on an executor (since 
it's reporting missing shuffle files), so I'd recommend looking at the logs on 
the executor.  If that's the case, then this is less a bug with LDA and more an 
issue with the implementation's scalability (which is being worked on in 
another JIRA).

You might want to try the online optimizer added in 1.4.  The default optimizer 
is EM, which is sometimes less scalable.

> When using LDA DAGScheduler throws exception
> --------------------------------------------
>
>                 Key: SPARK-8904
>                 URL: https://issues.apache.org/jira/browse/SPARK-8904
>             Project: Spark
>          Issue Type: Bug
>          Components: ML, MLlib
>    Affects Versions: 1.4.0
>         Environment: Amazon EC2 using ubuntu
>            Reporter: Ohad Zadok
>         Attachments: ldaexample.scala, screen1.png, screen2.png
>
>
> When using the LDA algorithm, DAGscheduler throws an exeption, this is the 
> stack trace:
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Driver stacktrace:
>         at 
> org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1266)
>         at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1257)
>         at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1256)
>         at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>         at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>         at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1256)
>         at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
>         at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:730)
>         at scala.Option.foreach(Option.scala:236)
>         at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:730)
>         at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1450)
>         at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1411)
>         at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to