Regarding recommenders, drivers, and import/export: I’ve got Sebastian’s cooccurrence code wrapped with a driver that reads text delimited files into a drm for use with cooccurrence. Then it writes the indicator matrix(es) as text delimited files with user specified IDs. It also has a proposed Driver base class, Scala based option parser and ReadStore/WriteStore traits. The CLI will be mostly a superset of the itemsimilarity in legacy mr. The read/write stuff is meant to be pretty generic so I was planning to do a DB and maybe JSON example (some day). There is still a bit of functional programming refactoring and the docs are not up to date.
With cooccurrence working we could do something that replaces all the cooccurrence recommenders (in-memory and MR) with one codebase. Add Solr and you have a single machine server based recommender that we can supply with an API similar to the legacy in-memory recommender. The cool thing is that It will scale out to a cluster with Solr and HDFS, requiring only config changes. The downside is that it requires at least a standalone local version of Spark to do the cooccurrence. BTW this would give us something people have been asking for—a recommender service. Is anyone else interested in CLI, drivers, read/write in the import/export sense? Or a new architecture for the recommenders? If so, maybe a separate thread? On May 29, 2014, at 7:03 AM, Ted Dunning <[email protected]> wrote: Andrew, Sebastian and I were talking yesterday and guessing that you would be interested in this soon. Glad to know the world is as expected. Yes. This needs to happen at least at a very conceptual level. For instance, for classifiers, I think that we need to have something like: - progressively train against a batch of data questions: should this do multiple epochs? Throw an exception if on-line training not supported? throw an exception if too little data provided? - classify a batch of data - serialize a model - de-serialize a model Note that a batch listed above should be either a bunch of observations or just one. Question: does this handle the following cases: - naive bayes - SGD trained on continuous data - batch trained <mumble> classifiers - downpour type classifier training ? On Wed, May 28, 2014 at 6:25 PM, Andrew Palumbo <[email protected]> wrote: > This may be somewhat tangential to this thread, but would now be a good > time to start laying out some scala traits for > Classifiers/Clusterers/Recommenders? I am totally scala-naive, but have > been trying to keep up with the discussions. > > I don't know if this is premature but it seems that now that the DSL data > structures have been at least sketched out if not fully implemented, it > would be useful to have these in place before people start porting too much > over. It might be helpful in bringing in new contributions as well. > > It could also help regarding people's questions of integrating a future > wrapper layer. > > > >> From: [email protected] >> Date: Wed, 28 May 2014 17:10:43 -0700 >> Subject: Re: do we really need scala still >> To: [email protected] >> >> +1 >> >> Let's use a successful scala model as a suggestion about where to go. It >> seems plausible that Java could emulate the building of a lazy DSL > logical >> plan and then poke it in plausible ways with the addition of a wrapper >> layer. But that only helps if the Scala layer succeeds. >> >> >> >> On Tue, May 27, 2014 at 10:56 AM, Dmitriy Lyubimov <[email protected] >> wrote: >> >>> Also, i think that this is leaning towards false dilemma fallacy. > Scala and >>> java models could happily exist at the same time and hopefully, minimal >>> fragmentation of the project if done with precision and care. >>> >>> >>> On Tue, May 27, 2014 at 10:46 AM, Dmitriy Lyubimov <[email protected] >>>> wrote: >>> >>>> >>>> not sure there's much sense in taking user survey if we can't act on >>> this. >>>> In our situation, unfortunately, we don't have that many ideas to > choose >>>> from, so there's not much wiggle room imo. It is more like > reinforcement >>>> learning -- stuff that doesn't get used or supported, just dies > .that's >>> it. >>>> Scala bindings, though thumb up'd internally, are yet to earn this > status >>>> externally. In that sense we always have been watching for > use/support, >>>> that's why we culled out tons of stuff. Nothing changes going > forward (at >>>> least at this point). If we have tons of new ideas/contributions, > then it >>>> may be different. What is weak, dies on its own pretty evidently > without >>>> much extra effort. >>>> >>>> >>>> On Tue, May 27, 2014 at 10:32 AM, Pat Ferrel <[email protected]> >>> wrote: >>>> >>>>> We are asking that anyone using Mahout as a lib or in the DSL-shell > to >>>>> learn Scala. While I still think it’s the right idea, user’s may >>> disagree. >>>>> We should probably either solicit comments or at least keep an eye > on >>>>> reactions to this. Spark took this route when the question was even >>> more in >>>>> doubt and so is at least partially supporting multiple bindings. >>>>> >>>>> Not sure how far we want to carry this but we could supply Java > bindings >>>>> to the CLI-type things pretty easily. >>>>> >>>>> >>>>> On May 26, 2014, at 2:43 PM, Dmitriy Lyubimov <[email protected]> >>> wrote: >>>>> >>>>> Well, first, functional programming in java8 is about 2-3 years > late to >>>>> the >>>>> scene. So the reasoning along the lines, hey, we already are using > tool >>> A, >>>>> and now tool B is available which is almost as good as A, so let's >>> migrate >>>>> to B, is fallible. Tool B must demonstrate not just matching >>> capabilities, >>>>> but far superb, to justify cost of such migration. >>>>> >>>>> Second, as other pointed, java 8 doesn't really match scala, not yet >>>>> anyway. One important feature of scala bindings work is proper > operator >>>>> overload (R-like DSL). That would not be possible to do in java 8, > as it >>>>> stands. Yes, as other pointed, it makes things concise, but most >>>>> importantly, it also makes things operation-centric and eliminates >>> nested >>>>> calls pile-up. >>>>> >>>>> Third, as it stands today, it would also presentn a problem from the >>> Spark >>>>> integration point of view. Spark does have java bindings, but first, >>> they >>>>> are underdefined (you can check spark list for tons of postings > about >>>>> missing equivalent capability), and they are certainly not >>> java-8-vetted. >>>>> So java api in Spark for java 8 purposes, as it stands, is a moot > point. >>>>> >>>>> There are also a number other goodies and clashes that exist -- use > of >>>>> scala collections vs. Java collections, clean functional type > syntax, >>>>> magic >>>>> methods, partially defined functions, case class matchers, > implicits, >>> view >>>>> and context bounds etc. Etc., all that sh$tload of acrobatics that > comes >>>>> actually very handy in existing implemetations and has no > substitute in >>>>> Java 8. >>>>> On May 25, 2014 12:48 PM, "bandi shankar" <[email protected]> >>> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I was just thinking , do we still need scala . Since in java 8 we > have >>>>>> all(probably) kind of feature provided by scala. >>>>>> Since I am new to group , so just thinking why not to make mahout > away >>>>>> from scala. Is there any specific reason to adopt scala. >>>>>> >>>>>> Bandi >>>>>> >>>>> >>>>> >>>> >>> > >
