Re: cf/couccurence code

Pat Ferrel Wed, 25 Jun 2014 08:53:48 -0700

Seems like the cf stuff as well as other algos that are consumers of 
“math-scala” but are not really math, should go in a new “core” project 
perhaps. If so the pom should probably be pretty similar to math-scala so that 
any Spark dependencies are noticed. Keeping them in a scala only sub-project 
might allow for some future use of the Scala builder—sbt, but that’s for 
another discussion.


Using the old naming conventions and adding the -scala would suggest a 
“core-scala” sub-project with a pom similar to math-scala

If there are no objections, I’ll do that. I’m not a maven expert though so 
someone may want to look at that when the PR comes in.


On Jun 19, 2014, at 6:49 PM, Pat Ferrel <[email protected]> wrote:

What sub-project and package? 

In general how do we want to handle new Scala code?

I’m putting Spark specific stuff like I/O and drivers in Spark and using a new 
“drivers” package. There was one called “driver” in mrlegacy. Do we want to 
follow the old Java packaging as much as possible? This may cause naming 
conflicts, right?

The only non-Spark specific Scala sub-project is math-scala. Is this where we 
want cf/cooccurrence?

Also how do we want to handle CLI drivers? Seems like we might have something 
like “mahout-spark itemsimilarity -i hdfs://...”


On Jun 19, 2014, at 1:02 PM, Pat Ferrel <[email protected]> wrote:

Not sure if the previous mail got through

I'm in a car

No spark deps in cf/cooccurrence it can be moved

The deps are in I/O code in ItemSimilarityJob the subject of the pr just before 
your first email

Sorry for the confusion

Sent from my iPhone

> On Jun 19, 2014, at 12:06 PM, Anand Avati <[email protected]> wrote:
> 
> Pat,
> I don't seem to find such spark specific code in cf.. cf code itself is
> engine agnostic. But of course you need some engine to use it. Similar to
> the distributed decomposition stuff in math-scala. They need some engine to
> run them, but the code itself is engine agnostic and in math-scala. Am I
> missing something basic here?
> 
> 
>> On Thu, Jun 19, 2014 at 11:47 AM, Pat Ferrel <[email protected]> wrote:
>> 
>> Actually it has several Spark deps like having an SparkContext, SparkConf,
>> and and rdd for file I/O
>> Please look before you vote. I’ve been waving this flag for awhile—I/O is
>> not engine neutral.
>> 
>> 
>> On Jun 19, 2014, at 11:41 AM, Sebastian Schelter <[email protected]> wrote:
>> 
>> Hi Anand,
>> 
>> Yes, this should not contain anything spark-specific. +1 for moving it.
>> 
>> --sebastian
>> 
>> 
>> 
>>> On 06/19/2014 08:38 PM, Anand Avati wrote:
>>> Hi Pat and others,
>>> I see that cf/CooccuranceAnalysis.scala is currently under spark. Is
>> there
>>> a specific reason? I see that the code itself is completely spark
>> agnostic.
>>> I tried moving the code under
>>> math-scala/src/main/scala/org/apache/mahout/math/cf/ with the following
>>> trivial patch:
>>> 
>>> diff --git
>> a/spark/src/main/scala/org/apache/mahout/cf/CooccurrenceAnalysis.scala
>>> b/spark/src/main/scala/org/apache/mahout/cf/CooccurrenceAnalysis.scala
>>> index ee44f90..bd20956 100644
>>> ---
>> a/spark/src/main/scala/org/apache/mahout/cf/CooccurrenceAnalysis.scala
>>> +++
>> b/spark/src/main/scala/org/apache/mahout/cf/CooccurrenceAnalysis.scala
>>> @@ -22,7 +22,6 @@ import scalabindings._
>>> import RLikeOps._
>>> import drm._
>>> import RLikeDrmOps._
>>> -import org.apache.mahout.sparkbindings._
>>> import scala.collection.JavaConversions._
>>> import org.apache.mahout.math.stats.LogLikelihood
>>> 
>>> 
>>> and it seems to work just fine. From what I see, this should work just
>> fine
>>> on H2O as well with no changes.. Why give up generality and make it spark
>>> specific?
>>> 
>>> Thanks
>> 
>> 
>>

Re: cf/couccurence code

Reply via email to