[jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark

Pat Ferrel (JIRA) Thu, 17 Apr 2014 08:09:01 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973012#comment-13973012
 ]


Pat Ferrel commented on MAHOUT-1464:
------------------------------------

Getting the cooccurrence code to read or write to hdfs is still not working. 
The cooccurrence code does not seem to use the context that is created though 
the computation does execute on the cluster and seems to complete properly, I 
wonder if the context is needed to do the read/write as it is in the above 
spark-shell example. So the following val is not use afaikt.

    implicit val sc = mahoutSparkContext(masterUrl = "spark://occam4:7077", 
appName = "MahoutClusterContext",
      customJars = Traversable.empty[String])

I can't even get this job to complete using the local file system, some strange 
paths are created for "_temporary" depending on who knows what. one even looked 
like some version of Linux I don't own: Exception in thread "main" 
org.apache.spark.SparkException: Job aborted: Task 8.0:0 failed 4 times (most 
recent failure: 

Exception failure: java.io.IOException: The temporary job-output directory 
file:/private/tmp/tmp/co-occurrence-on-epinions/indicators-item-item/_temporary 
doesn't exist!)

/private/tmp ??? what is that Centos? I'm using ubuntu 12.04

Onwards to looking at the Spark config. 

Can you answer the question about why we don't use the context 'sc' to read and 
write as with the spark-shell example? 

> Cooccurrence Analysis on Spark
> ------------------------------
>
>                 Key: MAHOUT-1464
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1464
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Collaborative Filtering
>         Environment: hadoop, spark
>            Reporter: Pat Ferrel
>            Assignee: Sebastian Schelter
>             Fix For: 1.0
>
>         Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, 
> MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, run-spark-xrsj.sh
>
>
> Create a version of Cooccurrence Analysis (RowSimilarityJob with LLR) that 
> runs on Spark. This should be compatible with Mahout Spark DRM DSL so a DRM 
> can be used as input. 
> Ideally this would extend to cover MAHOUT-1422. This cross-cooccurrence has 
> several applications including cross-action recommendations. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark

Reply via email to