Re: [jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark

Dmitriy Lyubimov Mon, 14 Apr 2014 11:36:16 -0700

inline


On Mon, Apr 14, 2014 at 11:21 AM, Pat Ferrel (JIRA) <[email protected]> wrote:

>
>     [
> https://issues.apache.org/jira/browse/MAHOUT-1464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13968613#comment-13968613]
>
> Pat Ferrel commented on MAHOUT-1464:
> ------------------------------------
>
> @Dmitriy, no clue what email you are talking about, you have written a lot
> lately. Where is it, on a Jira?
>
no, on @dev... basically you want to run it as a standalone application
(just like SparkPI example). The easiest way to do it is just import all
mahout tree into idea and launch Sebastian's driver program directly, that
much should work -- especially since you only care about local mode in fact
(just to be clear, "local" master means same jvm, single thread, really
useful for debugging only).

>
> I did my setup and tried launching with Hadoop and Mahout running locally
> (MAHOUT_LOCAL=true),
>
this environment variable would have no bearing on spark program. The only
thing that is important is master url per above.


> One localhost instance of Spark, passing in the 'mvn package' mahout spark
> jar from the localfs and pointing at data on the localfs.  This is per
> instructions of the Spark site. There is no firewall issue since it is
> always localhost talking to localhost.
>

You need to be a bit more specific here.

Yes you can run spark as a single node cluster (just like hadoop single
node cluster), but that would be still "standalone" master, not "local".
"local" is as i indicated, is totally same jvm, single thread, it does not
require starting any additional spark processes.

As long as you want "standalone" (i.e. real thing, albeit single-node) you
need not use Client. It won't work. Launch program directly, just like they
do with examples such as SparkPi. this Client thing will not work for our
Mahout programs without additional considerations.


>
> Anyway if I could find your "running mahout on spark" email it would
> probably explain what I'm doing wrong.
>
> You did see I was using Spark 0.9.1?
>
In all likelihood this should be fine if you also change dependency and
recompile with it in root pom.xml. Otherwise there's no way of reliably
telling if different versions on client on backend may trigger
incompatibilities other than trying. (e.g. if they changed akka or netty
version between 0.9.0 and 0.9.1).



>
> > Cooccurrence Analysis on Spark
> > ------------------------------
> >
> >                 Key: MAHOUT-1464
> >                 URL: https://issues.apache.org/jira/browse/MAHOUT-1464
> >             Project: Mahout
> >          Issue Type: Improvement
> >          Components: Collaborative Filtering
> >         Environment: hadoop, spark
> >            Reporter: Pat Ferrel
> >            Assignee: Sebastian Schelter
> >             Fix For: 1.0
> >
> >         Attachments: MAHOUT-1464.patch, MAHOUT-1464.patch,
> MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch, MAHOUT-1464.patch,
> run-spark-xrsj.sh
> >
> >
> > Create a version of Cooccurrence Analysis (RowSimilarityJob with LLR)
> that runs on Spark. This should be compatible with Mahout Spark DRM DSL so
> a DRM can be used as input.
> > Ideally this would extend to cover MAHOUT-1422. This cross-cooccurrence
> has several applications including cross-action recommendations.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.2#6252)
>

Re: [jira] [Commented] (MAHOUT-1464) Cooccurrence Analysis on Spark

Reply via email to