Re: cf/couccurence code

Pat Ferrel Thu, 19 Jun 2014 12:07:01 -0700

Sorry, in the car.

Cf/cooccurrence is not Spark dependent it can be moved


I thought the reference was to the pr I just did for ItemSimilarityJob, which 
has the deps I mentioned

Sent from my iPhone

> On Jun 19, 2014, at 12:01 PM, Dmitriy Lyubimov <[email protected]> wrote:
> 
> Pat,
> it is -- or it is simply missing.
> 
> If you are trying load a matrix from a text file, there's simply no mapping
> to a text file format -- but it could be created i suppose.
> 
> if you are trying to load something other than a matrix, then it is not an
> issue of I/O but simply the fact that you are not doing algebra. Yes, doing
> non-algebraic stuff in Spark is not abstracted.
> 
> Not sure which is a problem in this case.
> 
> 
>> On Thu, Jun 19, 2014 at 11:47 AM, Pat Ferrel <[email protected]> wrote:
>> 
>> Actually it has several Spark deps like having an SparkContext, SparkConf,
>> and and rdd for file I/O
>> Please look before you vote. I’ve been waving this flag for awhile—I/O is
>> not engine neutral.
>> 
>> 
>> On Jun 19, 2014, at 11:41 AM, Sebastian Schelter <[email protected]> wrote:
>> 
>> Hi Anand,
>> 
>> Yes, this should not contain anything spark-specific. +1 for moving it.
>> 
>> --sebastian
>> 
>> 
>> 
>>> On 06/19/2014 08:38 PM, Anand Avati wrote:
>>> Hi Pat and others,
>>> I see that cf/CooccuranceAnalysis.scala is currently under spark. Is
>> there
>>> a specific reason? I see that the code itself is completely spark
>> agnostic.
>>> I tried moving the code under
>>> math-scala/src/main/scala/org/apache/mahout/math/cf/ with the following
>>> trivial patch:
>>> 
>>> diff --git
>> a/spark/src/main/scala/org/apache/mahout/cf/CooccurrenceAnalysis.scala
>>> b/spark/src/main/scala/org/apache/mahout/cf/CooccurrenceAnalysis.scala
>>> index ee44f90..bd20956 100644
>>> ---
>> a/spark/src/main/scala/org/apache/mahout/cf/CooccurrenceAnalysis.scala
>>> +++
>> b/spark/src/main/scala/org/apache/mahout/cf/CooccurrenceAnalysis.scala
>>> @@ -22,7 +22,6 @@ import scalabindings._
>>> import RLikeOps._
>>> import drm._
>>> import RLikeDrmOps._
>>> -import org.apache.mahout.sparkbindings._
>>> import scala.collection.JavaConversions._
>>> import org.apache.mahout.math.stats.LogLikelihood
>>> 
>>> 
>>> and it seems to work just fine. From what I see, this should work just
>> fine
>>> on H2O as well with no changes.. Why give up generality and make it spark
>>> specific?
>>> 
>>> Thanks
>> 
>> 
>>

Re: cf/couccurence code

Reply via email to