Re: cf/couccurence code

Pat Ferrel Thu, 19 Jun 2014 13:04:00 -0700

Not sure if the previous mail got through

I'm in a car


No spark deps in cf/cooccurrence it can be moved

The deps are in I/O code in ItemSimilarityJob the subject of the pr just before 
your first email

Sorry for the confusion

Sent from my iPhone

> On Jun 19, 2014, at 12:06 PM, Anand Avati <[email protected]> wrote:
> 
> Pat,
> I don't seem to find such spark specific code in cf.. cf code itself is
> engine agnostic. But of course you need some engine to use it. Similar to
> the distributed decomposition stuff in math-scala. They need some engine to
> run them, but the code itself is engine agnostic and in math-scala. Am I
> missing something basic here?
> 
> 
>> On Thu, Jun 19, 2014 at 11:47 AM, Pat Ferrel <[email protected]> wrote:
>> 
>> Actually it has several Spark deps like having an SparkContext, SparkConf,
>> and and rdd for file I/O
>> Please look before you vote. I’ve been waving this flag for awhile—I/O is
>> not engine neutral.
>> 
>> 
>> On Jun 19, 2014, at 11:41 AM, Sebastian Schelter <[email protected]> wrote:
>> 
>> Hi Anand,
>> 
>> Yes, this should not contain anything spark-specific. +1 for moving it.
>> 
>> --sebastian
>> 
>> 
>> 
>>> On 06/19/2014 08:38 PM, Anand Avati wrote:
>>> Hi Pat and others,
>>> I see that cf/CooccuranceAnalysis.scala is currently under spark. Is
>> there
>>> a specific reason? I see that the code itself is completely spark
>> agnostic.
>>> I tried moving the code under
>>> math-scala/src/main/scala/org/apache/mahout/math/cf/ with the following
>>> trivial patch:
>>> 
>>> diff --git
>> a/spark/src/main/scala/org/apache/mahout/cf/CooccurrenceAnalysis.scala
>>> b/spark/src/main/scala/org/apache/mahout/cf/CooccurrenceAnalysis.scala
>>> index ee44f90..bd20956 100644
>>> ---
>> a/spark/src/main/scala/org/apache/mahout/cf/CooccurrenceAnalysis.scala
>>> +++
>> b/spark/src/main/scala/org/apache/mahout/cf/CooccurrenceAnalysis.scala
>>> @@ -22,7 +22,6 @@ import scalabindings._
>>> import RLikeOps._
>>> import drm._
>>> import RLikeDrmOps._
>>> -import org.apache.mahout.sparkbindings._
>>> import scala.collection.JavaConversions._
>>> import org.apache.mahout.math.stats.LogLikelihood
>>> 
>>> 
>>> and it seems to work just fine. From what I see, this should work just
>> fine
>>> on H2O as well with no changes.. Why give up generality and make it spark
>>> specific?
>>> 
>>> Thanks
>> 
>> 
>>

Re: cf/couccurence code

Reply via email to