Re: cf/couccurence code

Ted Dunning Mon, 30 Jun 2014 00:14:25 -0700

This makes reasonable sense.  The CF stuff does *use* math a fair but but
could be said not to *be* math in itself.


On the other hand, the core/math split in Mahout itself was motivated by
the need to isolate the Hadoop dependencies.  I am not clear that the same
is true here.  Is there an inherent need to separate these?




On Wed, Jun 25, 2014 at 8:52 AM, Pat Ferrel <[email protected]> wrote:

> Seems like the cf stuff as well as other algos that are consumers of
> “math-scala” but are not really math, should go in a new “core” project
> perhaps. If so the pom should probably be pretty similar to math-scala so
> that any Spark dependencies are noticed. Keeping them in a scala only
> sub-project might allow for some future use of the Scala builder—sbt, but
> that’s for another discussion.
>
> Using the old naming conventions and adding the -scala would suggest a
> “core-scala” sub-project with a pom similar to math-scala
>
> If there are no objections, I’ll do that. I’m not a maven expert though so
> someone may want to look at that when the PR comes in.
>
>
> On Jun 19, 2014, at 6:49 PM, Pat Ferrel <[email protected]> wrote:
>
> What sub-project and package?
>
> In general how do we want to handle new Scala code?
>
> I’m putting Spark specific stuff like I/O and drivers in Spark and using a
> new “drivers” package. There was one called “driver” in mrlegacy. Do we
> want to follow the old Java packaging as much as possible? This may cause
> naming conflicts, right?
>
> The only non-Spark specific Scala sub-project is math-scala. Is this where
> we want cf/cooccurrence?
>
> Also how do we want to handle CLI drivers? Seems like we might have
> something like “mahout-spark itemsimilarity -i hdfs://...”
>
>
> On Jun 19, 2014, at 1:02 PM, Pat Ferrel <[email protected]> wrote:
>
> Not sure if the previous mail got through
>
> I'm in a car
>
> No spark deps in cf/cooccurrence it can be moved
>
> The deps are in I/O code in ItemSimilarityJob the subject of the pr just
> before your first email
>
> Sorry for the confusion
>
> Sent from my iPhone
>
> > On Jun 19, 2014, at 12:06 PM, Anand Avati <[email protected]> wrote:
> >
> > Pat,
> > I don't seem to find such spark specific code in cf.. cf code itself is
> > engine agnostic. But of course you need some engine to use it. Similar to
> > the distributed decomposition stuff in math-scala. They need some engine
> to
> > run them, but the code itself is engine agnostic and in math-scala. Am I
> > missing something basic here?
> >
> >
> >> On Thu, Jun 19, 2014 at 11:47 AM, Pat Ferrel <[email protected]>
> wrote:
> >>
> >> Actually it has several Spark deps like having an SparkContext,
> SparkConf,
> >> and and rdd for file I/O
> >> Please look before you vote. I’ve been waving this flag for awhile—I/O
> is
> >> not engine neutral.
> >>
> >>
> >> On Jun 19, 2014, at 11:41 AM, Sebastian Schelter <[email protected]>
> wrote:
> >>
> >> Hi Anand,
> >>
> >> Yes, this should not contain anything spark-specific. +1 for moving it.
> >>
> >> --sebastian
> >>
> >>
> >>
> >>> On 06/19/2014 08:38 PM, Anand Avati wrote:
> >>> Hi Pat and others,
> >>> I see that cf/CooccuranceAnalysis.scala is currently under spark. Is
> >> there
> >>> a specific reason? I see that the code itself is completely spark
> >> agnostic.
> >>> I tried moving the code under
> >>> math-scala/src/main/scala/org/apache/mahout/math/cf/ with the following
> >>> trivial patch:
> >>>
> >>> diff --git
> >> a/spark/src/main/scala/org/apache/mahout/cf/CooccurrenceAnalysis.scala
> >>> b/spark/src/main/scala/org/apache/mahout/cf/CooccurrenceAnalysis.scala
> >>> index ee44f90..bd20956 100644
> >>> ---
> >> a/spark/src/main/scala/org/apache/mahout/cf/CooccurrenceAnalysis.scala
> >>> +++
> >> b/spark/src/main/scala/org/apache/mahout/cf/CooccurrenceAnalysis.scala
> >>> @@ -22,7 +22,6 @@ import scalabindings._
> >>> import RLikeOps._
> >>> import drm._
> >>> import RLikeDrmOps._
> >>> -import org.apache.mahout.sparkbindings._
> >>> import scala.collection.JavaConversions._
> >>> import org.apache.mahout.math.stats.LogLikelihood
> >>>
> >>>
> >>> and it seems to work just fine. From what I see, this should work just
> >> fine
> >>> on H2O as well with no changes.. Why give up generality and make it
> spark
> >>> specific?
> >>>
> >>> Thanks
> >>
> >>
> >>
>
>
>

Re: cf/couccurence code

Reply via email to