Sorry, in the car. Cf/cooccurrence is not Spark dependent it can be moved
I thought the reference was to the pr I just did for ItemSimilarityJob, which has the deps I mentioned Sent from my iPhone > On Jun 19, 2014, at 12:01 PM, Dmitriy Lyubimov <[email protected]> wrote: > > Pat, > it is -- or it is simply missing. > > If you are trying load a matrix from a text file, there's simply no mapping > to a text file format -- but it could be created i suppose. > > if you are trying to load something other than a matrix, then it is not an > issue of I/O but simply the fact that you are not doing algebra. Yes, doing > non-algebraic stuff in Spark is not abstracted. > > Not sure which is a problem in this case. > > >> On Thu, Jun 19, 2014 at 11:47 AM, Pat Ferrel <[email protected]> wrote: >> >> Actually it has several Spark deps like having an SparkContext, SparkConf, >> and and rdd for file I/O >> Please look before you vote. I’ve been waving this flag for awhile—I/O is >> not engine neutral. >> >> >> On Jun 19, 2014, at 11:41 AM, Sebastian Schelter <[email protected]> wrote: >> >> Hi Anand, >> >> Yes, this should not contain anything spark-specific. +1 for moving it. >> >> --sebastian >> >> >> >>> On 06/19/2014 08:38 PM, Anand Avati wrote: >>> Hi Pat and others, >>> I see that cf/CooccuranceAnalysis.scala is currently under spark. Is >> there >>> a specific reason? I see that the code itself is completely spark >> agnostic. >>> I tried moving the code under >>> math-scala/src/main/scala/org/apache/mahout/math/cf/ with the following >>> trivial patch: >>> >>> diff --git >> a/spark/src/main/scala/org/apache/mahout/cf/CooccurrenceAnalysis.scala >>> b/spark/src/main/scala/org/apache/mahout/cf/CooccurrenceAnalysis.scala >>> index ee44f90..bd20956 100644 >>> --- >> a/spark/src/main/scala/org/apache/mahout/cf/CooccurrenceAnalysis.scala >>> +++ >> b/spark/src/main/scala/org/apache/mahout/cf/CooccurrenceAnalysis.scala >>> @@ -22,7 +22,6 @@ import scalabindings._ >>> import RLikeOps._ >>> import drm._ >>> import RLikeDrmOps._ >>> -import org.apache.mahout.sparkbindings._ >>> import scala.collection.JavaConversions._ >>> import org.apache.mahout.math.stats.LogLikelihood >>> >>> >>> and it seems to work just fine. From what I see, this should work just >> fine >>> on H2O as well with no changes.. Why give up generality and make it spark >>> specific? >>> >>> Thanks >> >> >>
