[
https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13968537#comment-13968537
]
Pat Ferrel edited comment on MAHOUT-1464 at 4/14/14 5:18 PM:
-------------------------------------------------------------
OK, I have a cluster set up but first tried locally on my laptop. I installed
the latest Spark 0.9.1 (not 0.9.0 called for in the pom assuming this is OK),
which uses Scala 2.10. BTW the object RunCrossCooccurrenceAnalysisOnEpinions
has an incorrect comment println about usage--wrong object name. I never get
the printlns, I assume because I'm not launching from the Spark shell???
println("Usage: RunCooccurrenceAnalysisOnMovielens1M
<path-to-dataset-folder>")
This leads me to believe that you launch from the Spark Scala shell?? Anyway I
tried the method called out in the Spark docs for CLI execution shown below and
execute RunCrossCooccurrenceAnalysisOnEpinions via a bash script. Not sure
where to look for output. The code says:
RecommendationExamplesHelper.saveIndicatorMatrix(indicatorMatrices(0),
"/tmp/co-occurrence-on-epinions/indicators-item-item/")
RecommendationExamplesHelper.saveIndicatorMatrix(indicatorMatrices(1),
"/tmp/co-occurrence-on-epinions/indicators-trust-item/")
Assume this in localfs since the data came from there? I see the Spark pids
there but no temp data.
Here's how I ran it.
Put data in localfs:
Maclaurin:mahout pat$ ls -al ~/hdfs-mirror/xrsj/
total 29320
drwxr-xr-x 4 pat staff 136 Apr 14 09:01 .
drwxr-xr-x 10 pat staff 340 Apr 14 09:00 ..
-rw-r--r-- 1 pat staff 8650128 Apr 14 09:01 ratings_data.txt
-rw-r--r-- 1 pat staff 6357397 Apr 14 09:01 trust_data.txt
Start up Spark on localhost, webUI says all is well.
Run the xrsj on local data via shell script attached.
The driver runs and creates a worker, which runs for quite awhile but the log
says there was an ERROR.
Maclaurin:mahout pat$ cat
/Users/pat/spark-0.9.1-bin-hadoop1/sbin/../logs/spark-pat-org.apache.spark.deploy.worker.Worker-1-
spark-pat-org.apache.spark.deploy.worker.Worker-1-Maclaurin.local.out
spark-pat-org.apache.spark.deploy.worker.Worker-1-Maclaurin.local.out.2
spark-pat-org.apache.spark.deploy.worker.Worker-1-Maclaurin.local.out.1
spark-pat-org.apache.spark.deploy.worker.Worker-1-occam4.out
Maclaurin:mahout pat$ cat
/Users/pat/spark-0.9.1-bin-hadoop1/sbin/../logs/spark-pat-org.apache.spark.deploy.worker.Worker-1-Maclaurin.local.out
Spark Command:
/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java -cp
:/Users/pat/spark-0.9.1-bin-hadoop1/conf:/Users/pat/spark-0.9.1-bin-hadoop1/assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop1.0.4.jar
-Dspark.akka.logLifecycleEvents=true -Djava.library.path= -Xms512m -Xmx512m
org.apache.spark.deploy.worker.Worker spark://Maclaurin.local:7077
========================================
log4j:WARN No appenders could be found for logger
(akka.event.slf4j.Slf4jLogger).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more
info.
14/04/14 09:26:00 INFO Worker: Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
14/04/14 09:26:00 INFO Worker: Starting Spark worker 192.168.0.2:52068 with 8
cores, 15.0 GB RAM
14/04/14 09:26:00 INFO Worker: Spark home: /Users/pat/spark-0.9.1-bin-hadoop1
14/04/14 09:26:00 INFO WorkerWebUI: Started Worker web UI at
http://192.168.0.2:8081
14/04/14 09:26:00 INFO Worker: Connecting to master
spark://Maclaurin.local:7077...
14/04/14 09:26:00 INFO Worker: Successfully registered with master
spark://Maclaurin.local:7077
14/04/14 09:26:19 INFO Worker: Asked to launch driver driver-20140414092619-0000
2014-04-14 09:26:19.947 java[53509:9407] Unable to load realm info from
SCDynamicStore
14/04/14 09:26:20 INFO DriverRunner: Copying user jar
file:/Users/pat/mahout/spark/target/mahout-spark-1.0-SNAPSHOT.jar to
/Users/pat/spark-0.9.1-bin-hadoop1/work/driver-20140414092619-0000/mahout-spark-1.0-SNAPSHOT.jar
14/04/14 09:26:20 INFO DriverRunner: Launch Command:
"/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java"
"-cp"
":/Users/pat/spark-0.9.1-bin-hadoop1/work/driver-20140414092619-0000/mahout-spark-1.0-SNAPSHOT.jar:/Users/pat/spark-0.9.1-bin-hadoop1/conf:/Users/pat/spark-0.9.1-bin-hadoop1/assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop1.0.4.jar:/usr/local/hadoop/conf"
"-Xms512M" "-Xmx512M" "org.apache.spark.deploy.worker.DriverWrapper"
"akka.tcp://[email protected]:52068/user/Worker"
"RunCrossCooccurrenceAnalysisOnEpinions" "file://Users/pat/hdfs-mirror/xrsj"
14/04/14 09:26:21 ERROR OneForOneStrategy: FAILED (of class
scala.Enumeration$Val)
scala.MatchError: FAILED (of class scala.Enumeration$Val)
at
org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:277)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
14/04/14 09:26:21 INFO Worker: Starting Spark worker 192.168.0.2:52068 with 8
cores, 15.0 GB RAM
14/04/14 09:26:21 INFO Worker: Spark home: /Users/pat/spark-0.9.1-bin-hadoop1
14/04/14 09:26:21 INFO WorkerWebUI: Started Worker web UI at
http://192.168.0.2:8081
14/04/14 09:26:21 INFO Worker: Connecting to master
spark://Maclaurin.local:7077...
14/04/14 09:26:21 INFO Worker: Successfully registered with master
spark://Maclaurin.local:7077
was (Author: pferrel):
OK, I have a cluster set up but first tried locally on my laptop. I installed
the latest Spark 0.9.1 (not 0.9.0 called for in the pom assuming this is OK),
which uses Scala 2.10. BTW the object RunCrossCooccurrenceAnalysisOnEpinions
has an incorrect comment println about usage--wrong object name. I never get
the printlns, I assume because I'm not launching from the Spark shell???
println("Usage: RunCooccurrenceAnalysisOnMovielens1M
<path-to-dataset-folder>")
This leads me to believe that you launch from the Spark Scala shell?? Anyway I
tried the method called out in the Spark docs for CLI execution shown below and
execute RunCrossCooccurrenceAnalysisOnEpinions via a bash script. Not sure
where to look for output. The code says:
RecommendationExamplesHelper.saveIndicatorMatrix(indicatorMatrices(0),
"/tmp/co-occurrence-on-epinions/indicators-item-item/")
RecommendationExamplesHelper.saveIndicatorMatrix(indicatorMatrices(1),
"/tmp/co-occurrence-on-epinions/indicators-trust-item/")
Assume this in localfs since the data came from there? I see the Spark pids
there but no temp data.
Here's how I ran it.
Put data in localfs:
Maclaurin:mahout pat$ ls -al ~/hdfs-mirror/xrsj/
total 29320
drwxr-xr-x 4 pat staff 136 Apr 14 09:01 .
drwxr-xr-x 10 pat staff 340 Apr 14 09:00 ..
-rw-r--r-- 1 pat staff 8650128 Apr 14 09:01 ratings_data.txt
-rw-r--r-- 1 pat staff 6357397 Apr 14 09:01 trust_data.txt
Start up Spark on localhost, webUI says all is well.
Run the xrsj on local data via shell script:
#!/usr/bin/env bash
#./bin/spark-class org.apache.spark.deploy.Client launch
# [client-options] \
# <cluster-url> <application-jar-url> <main-class> \
# [application-options]
# cluster-url: The URL of the master node.
# application-jar-url: Path to a bundled jar including your application and all
dependencies. Currently, the URL must be globally visible inside of # your
cluster, for instance, an `hdfs://` path or$
# main-class: The entry point for your application.
# Client Options:
# --memory <count> (amount of memory, in MB, allocated for your driver program)
# --cores <count> (number of cores allocated for your driver program)
# --supervise (whether to automatically restart your driver on application or
node failure)
# --verbose (prints increased logging output)
# RunCrossCooccurrenceAnalysisOnEpinions <path-to-dataset-folder>
# Mahout Spark Jar from 'mvn package'
/Users/pat/spark-0.9.1-bin-hadoop1/bin/spark-class
org.apache.spark.deploy.Client launch \
spark://Maclaurin.local:7077
file:///Users/pat/mahout/spark/target/mahout-spark-1.0-SNAPSHOT.jar
RunCrossCooccurrenceAnalysisOnEpinions \
file://Users/pat/hdfs-mirror/xrsj
The driver runs and creates a worker, which runs for quite awhile but the log
says there was an ERROR.
Maclaurin:mahout pat$ cat
/Users/pat/spark-0.9.1-bin-hadoop1/sbin/../logs/spark-pat-org.apache.spark.deploy.worker.Worker-1-
spark-pat-org.apache.spark.deploy.worker.Worker-1-Maclaurin.local.out
spark-pat-org.apache.spark.deploy.worker.Worker-1-Maclaurin.local.out.2
spark-pat-org.apache.spark.deploy.worker.Worker-1-Maclaurin.local.out.1
spark-pat-org.apache.spark.deploy.worker.Worker-1-occam4.out
Maclaurin:mahout pat$ cat
/Users/pat/spark-0.9.1-bin-hadoop1/sbin/../logs/spark-pat-org.apache.spark.deploy.worker.Worker-1-Maclaurin.local.out
Spark Command:
/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java -cp
:/Users/pat/spark-0.9.1-bin-hadoop1/conf:/Users/pat/spark-0.9.1-bin-hadoop1/assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop1.0.4.jar
-Dspark.akka.logLifecycleEvents=true -Djava.library.path= -Xms512m -Xmx512m
org.apache.spark.deploy.worker.Worker spark://Maclaurin.local:7077
========================================
log4j:WARN No appenders could be found for logger
(akka.event.slf4j.Slf4jLogger).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more
info.
14/04/14 09:26:00 INFO Worker: Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
14/04/14 09:26:00 INFO Worker: Starting Spark worker 192.168.0.2:52068 with 8
cores, 15.0 GB RAM
14/04/14 09:26:00 INFO Worker: Spark home: /Users/pat/spark-0.9.1-bin-hadoop1
14/04/14 09:26:00 INFO WorkerWebUI: Started Worker web UI at
http://192.168.0.2:8081
14/04/14 09:26:00 INFO Worker: Connecting to master
spark://Maclaurin.local:7077...
14/04/14 09:26:00 INFO Worker: Successfully registered with master
spark://Maclaurin.local:7077
14/04/14 09:26:19 INFO Worker: Asked to launch driver driver-20140414092619-0000
2014-04-14 09:26:19.947 java[53509:9407] Unable to load realm info from
SCDynamicStore
14/04/14 09:26:20 INFO DriverRunner: Copying user jar
file:/Users/pat/mahout/spark/target/mahout-spark-1.0-SNAPSHOT.jar to
/Users/pat/spark-0.9.1-bin-hadoop1/work/driver-20140414092619-0000/mahout-spark-1.0-SNAPSHOT.jar
14/04/14 09:26:20 INFO DriverRunner: Launch Command:
"/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java"
"-cp"
":/Users/pat/spark-0.9.1-bin-hadoop1/work/driver-20140414092619-0000/mahout-spark-1.0-SNAPSHOT.jar:/Users/pat/spark-0.9.1-bin-hadoop1/conf:/Users/pat/spark-0.9.1-bin-hadoop1/assembly/target/scala-2.10/spark-assembly_2.10-0.9.1-hadoop1.0.4.jar:/usr/local/hadoop/conf"
"-Xms512M" "-Xmx512M" "org.apache.spark.deploy.worker.DriverWrapper"
"akka.tcp://[email protected]:52068/user/Worker"
"RunCrossCooccurrenceAnalysisOnEpinions" "file://Users/pat/hdfs-mirror/xrsj"
14/04/14 09:26:21 ERROR OneForOneStrategy: FAILED (of class
scala.Enumeration$Val)
scala.MatchError: FAILED (of class scala.Enumeration$Val)
at
org.apache.spark.deploy.worker.Worker$$anonfun$receive$1.applyOrElse(Worker.scala:277)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
14/04/14 09:26:21 INFO Worker: Starting Spark worker 192.168.0.2:52068 with 8
cores, 15.0 GB RAM
14/04/14 09:26:21 INFO Worker: Spark home: /Users/pat/spark-0.9.1-bin-hadoop1
14/04/14 09:26:21 INFO WorkerWebUI: Started Worker web UI at
http://192.168.0.2:8081
14/04/14 09:26:21 INFO Worker: Connecting to master
spark://Maclaurin.local:7077...
14/04/14 09:26:21 INFO Worker: Successfully registered with master
spark://Maclaurin.local:7077
> Cooccurrence Analysis on Spark
> ------------------------------
>
> Key: MAHOUT-1464
> URL: https://issues.apache.org/jira/browse/MAHOUT-1464
> Project: Mahout
> Issue Type: Improvement
> Components: Collaborative Filtering
> Environment: hadoop, spark
> Reporter: Pat Ferrel
> Assignee: Sebastian Schelter
> Fix For: 1.0
>
> Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch,
> MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, run-spark-xrsj.sh
>
>
> Create a version of Cooccurrence Analysis (RowSimilarityJob with LLR) that
> runs on Spark. This should be compatible with Mahout Spark DRM DSL so a DRM
> can be used as input.
> Ideally this would extend to cover MAHOUT-1422. This cross-cooccurrence has
> several applications including cross-action recommendations.
--
This message was sent by Atlassian JIRA
(v6.2#6252)