[
https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13969722#comment-13969722
]
Pat Ferrel edited comment on MAHOUT-1464 at 4/15/14 4:51 PM:
-------------------------------------------------------------
Silence indicates: you don't know how to? it can't be done because jars aren't
created for it yet? you'd rather launch from the Scala shell? If the later,
that's fine I just want to get IDEA out of the equation so instructions for
running in the Scala shell would be helpful.
I plan to move on to using HDFS for storage but still have a local storage
failure below.
Concentrating on local storage for now I get the following from my dev machine
launching in IDEA:
input, output, mahoutSparkContext(masterUrl = , Success?
local path, local path, "local",
yes
local path, local path, "spark://Maclaurin:7077", yes
local path local path, "spark://occam4:7077", no,
computation finishes correctly but the last stage dump/write
the DRM fails, the spark master is a remote machine who
is also the HDFS master and is managing three Spark
slaves, all is OK in the WebUI, no errors in the Spark logs
,
This last case I have tried various forms of the "local path" for output and
suspect that using the correct form of the URI may be the problem so if someone
sees the mistake please let me know:
1) "tmp/co-occurrence-on-epinions/indicators-item-item/" relative path to the
IDEA working directory, which works for input.
2) "/Users/pat/hdfs-mirror/tmp/co-occurrence-on-epinions/indicators-item-item/"
absolute path so no IDEA working directory
3)
"file:///Users/pat/hdfs-mirror/tmp/co-occurrence-on-epinions/indicators-item-item/"
URI form of full local path
Code for #3 is:
RecommendationExamplesHelper.saveIndicatorMatrix(indicatorMatrices(0),
"file:///Users/pat/hdfs-mirror/tmp/co-occurrence-on-epinions/indicators-item-item/")
For #3 I get the following exception message. The _temporary dir does exist,
there is just nothing in it:
14/04/15 09:07:03 INFO scheduler.DAGScheduler: Failed to run saveAsTextFile at
Recommendations.scala:178
Exception in thread "main" org.apache.spark.SparkException: Job aborted: Task
8.0:0 failed 4 times (most recent failure: Exception failure:
java.io.IOException: The temporary job-output directory
file:/Users/pat/hdfs-mirror/tmp/co-occurrence-on-epinions/indicators-item-item/_temporary
doesn't exist!)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:619)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Disconnected from the target VM, address: '127.0.0.1:58830', transport: 'socket'
was (Author: pferrel):
Silence indicates: you don't know how to? it can't be done because jars aren't
created for it yet? you'd rather launch from the Scala shell? If the later,
that's fine I just want to get IDEA out of the equation so instructions for
running in the Scala shell would be helpful.
I plan to move on to using HDFS for storage but still have a local storage
failure below.
Concentrating on local storage for now I get the following from my dev machine
launching in IDEA:
input output mahoutSparkContext(masterUrl = Success?
local path local path "local"
yes
local path local path "spark://Maclaurin:7077" yes
local path local path "spark://occam4:7077" no,
computation finishes correctly but the last stage dump/write
the DRM fails, the spark master is a remote machine who
is also the HDFS master and is managing three Spark
slaves, all is OK in the WebUI, no errors in the Spark logs
,
This last case I have tried various forms of the "local path" for output and
suspect that using the correct form of the URI may be the problem so if someone
sees the mistake please let me know:
1) "tmp/co-occurrence-on-epinions/indicators-item-item/" relative path to the
IDEA working directory, which works for input.
2) "/Users/pat/hdfs-mirror/tmp/co-occurrence-on-epinions/indicators-item-item/"
absolute path so no IDEA working directory
3)
"file:///Users/pat/hdfs-mirror/tmp/co-occurrence-on-epinions/indicators-item-item/"
URI form of full local path
Code for #3 is:
RecommendationExamplesHelper.saveIndicatorMatrix(indicatorMatrices(0),
"file:///Users/pat/hdfs-mirror/tmp/co-occurrence-on-epinions/indicators-item-item/")
For #3 I get the following exception message. The _temporary dir does exist,
there is just nothing in it:
14/04/15 09:07:03 INFO scheduler.DAGScheduler: Failed to run saveAsTextFile at
Recommendations.scala:178
Exception in thread "main" org.apache.spark.SparkException: Job aborted: Task
8.0:0 failed 4 times (most recent failure: Exception failure:
java.io.IOException: The temporary job-output directory
file:/Users/pat/hdfs-mirror/tmp/co-occurrence-on-epinions/indicators-item-item/_temporary
doesn't exist!)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1028)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1026)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1026)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$processEvent$10.apply(DAGScheduler.scala:619)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:619)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:207)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Disconnected from the target VM, address: '127.0.0.1:58830', transport: 'socket'
> Cooccurrence Analysis on Spark
> ------------------------------
>
> Key: MAHOUT-1464
> URL: https://issues.apache.org/jira/browse/MAHOUT-1464
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Environment: hadoop, spark
> Reporter: Pat Ferrel
> Assignee: Sebastian Schelter
> Fix For: 1.0
>
> Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch,
> MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, run-spark-xrsj.sh
>
>
> Create a version of Cooccurrence Analysis (RowSimilarityJob with LLR) that
> runs on Spark. This should be compatible with Mahout Spark DRM DSL so a DRM
> can be used as input.
> Ideally this would extend to cover MAHOUT-1422. This cross-cooccurrence has
> several applications including cross-action recommendations.
--
This message was sent by Atlassian JIRA
(v6.2#6252)