Pat, yes let's start a new thread for this. I'll start a thread in a bit, and forward over these these last few messages for point of reference.
Ted, glad to deliver on your world view! I have to admit that I've not put too much thought into this yet, and just wanted to get the conversation started at this point. I'll take some time to look at what you've proposed for classifiers and how it will handle NB, SGD, etc. and get back to you with what i see on the new thread. I think that it will be good to at least sketch these things out early. > Subject: Re: do we really need scala still > From: [email protected] > Date: Thu, 29 May 2014 08:58:04 -0700 > To: [email protected] > > Regarding recommenders, drivers, and import/export: > > I’ve got Sebastian’s cooccurrence code wrapped with a driver that reads text > delimited files into a drm for use with cooccurrence. Then it writes the > indicator matrix(es) as text delimited files with user specified IDs. It also > has a proposed Driver base class, Scala based option parser and > ReadStore/WriteStore traits. The CLI will be mostly a superset of the > itemsimilarity in legacy mr. The read/write stuff is meant to be pretty > generic so I was planning to do a DB and maybe JSON example (some day). There > is still a bit of functional programming refactoring and the docs are not up > to date. > > With cooccurrence working we could do something that replaces all the > cooccurrence recommenders (in-memory and MR) with one codebase. Add Solr and > you have a single machine server based recommender that we can supply with an > API similar to the legacy in-memory recommender. The cool thing is that It > will scale out to a cluster with Solr and HDFS, requiring only config > changes. The downside is that it requires at least a standalone local version > of Spark to do the cooccurrence. BTW this would give us something people have > been asking for—a recommender service. > > Is anyone else interested in CLI, drivers, read/write in the import/export > sense? Or a new architecture for the recommenders? If so, maybe a separate > thread? > > On May 29, 2014, at 7:03 AM, Ted Dunning <[email protected]> wrote: > > Andrew, > > Sebastian and I were talking yesterday and guessing that you would be > interested in this soon. Glad to know the world is as expected. > > Yes. This needs to happen at least at a very conceptual level. For > instance, for classifiers, I think that we need to have something like: > > - progressively train against a batch of data > questions: should this do multiple epochs? Throw an exception if > on-line training not supported? throw an exception if too little data > provided? > > - classify a batch of data > > - serialize a model > > - de-serialize a model > > Note that a batch listed above should be either a bunch of observations or > just one. > > Question: does this handle the following cases: > > - naive bayes > - SGD trained on continuous data > - batch trained <mumble> classifiers > - downpour type classifier training > > ? > > > > On Wed, May 28, 2014 at 6:25 PM, Andrew Palumbo <[email protected]> wrote: > > > This may be somewhat tangential to this thread, but would now be a good > > time to start laying out some scala traits for > > Classifiers/Clusterers/Recommenders? I am totally scala-naive, but have > > been trying to keep up with the discussions. > > > > I don't know if this is premature but it seems that now that the DSL data > > structures have been at least sketched out if not fully implemented, it > > would be useful to have these in place before people start porting too much > > over. It might be helpful in bringing in new contributions as well. > > > > It could also help regarding people's questions of integrating a future > > wrapper layer. > > > > > > > >> From: [email protected] > >> Date: Wed, 28 May 2014 17:10:43 -0700 > >> Subject: Re: do we really need scala still > >> To: [email protected] > >> > >> +1 > >> > >> Let's use a successful scala model as a suggestion about where to go. It > >> seems plausible that Java could emulate the building of a lazy DSL > > logical > >> plan and then poke it in plausible ways with the addition of a wrapper > >> layer. But that only helps if the Scala layer succeeds. > >> > >> > >> > >> On Tue, May 27, 2014 at 10:56 AM, Dmitriy Lyubimov <[email protected] > >> wrote: > >> > >>> Also, i think that this is leaning towards false dilemma fallacy. > > Scala and > >>> java models could happily exist at the same time and hopefully, minimal > >>> fragmentation of the project if done with precision and care. > >>> > >>> > >>> On Tue, May 27, 2014 at 10:46 AM, Dmitriy Lyubimov <[email protected] > >>>> wrote: > >>> > >>>> > >>>> not sure there's much sense in taking user survey if we can't act on > >>> this. > >>>> In our situation, unfortunately, we don't have that many ideas to > > choose > >>>> from, so there's not much wiggle room imo. It is more like > > reinforcement > >>>> learning -- stuff that doesn't get used or supported, just dies > > .that's > >>> it. > >>>> Scala bindings, though thumb up'd internally, are yet to earn this > > status > >>>> externally. In that sense we always have been watching for > > use/support, > >>>> that's why we culled out tons of stuff. Nothing changes going > > forward (at > >>>> least at this point). If we have tons of new ideas/contributions, > > then it > >>>> may be different. What is weak, dies on its own pretty evidently > > without > >>>> much extra effort. > >>>> > >>>> > >>>> On Tue, May 27, 2014 at 10:32 AM, Pat Ferrel <[email protected]> > >>> wrote: > >>>> > >>>>> We are asking that anyone using Mahout as a lib or in the DSL-shell > > to > >>>>> learn Scala. While I still think it’s the right idea, user’s may > >>> disagree. > >>>>> We should probably either solicit comments or at least keep an eye > > on > >>>>> reactions to this. Spark took this route when the question was even > >>> more in > >>>>> doubt and so is at least partially supporting multiple bindings. > >>>>> > >>>>> Not sure how far we want to carry this but we could supply Java > > bindings > >>>>> to the CLI-type things pretty easily. > >>>>> > >>>>> > >>>>> On May 26, 2014, at 2:43 PM, Dmitriy Lyubimov <[email protected]> > >>> wrote: > >>>>> > >>>>> Well, first, functional programming in java8 is about 2-3 years > > late to > >>>>> the > >>>>> scene. So the reasoning along the lines, hey, we already are using > > tool > >>> A, > >>>>> and now tool B is available which is almost as good as A, so let's > >>> migrate > >>>>> to B, is fallible. Tool B must demonstrate not just matching > >>> capabilities, > >>>>> but far superb, to justify cost of such migration. > >>>>> > >>>>> Second, as other pointed, java 8 doesn't really match scala, not yet > >>>>> anyway. One important feature of scala bindings work is proper > > operator > >>>>> overload (R-like DSL). That would not be possible to do in java 8, > > as it > >>>>> stands. Yes, as other pointed, it makes things concise, but most > >>>>> importantly, it also makes things operation-centric and eliminates > >>> nested > >>>>> calls pile-up. > >>>>> > >>>>> Third, as it stands today, it would also presentn a problem from the > >>> Spark > >>>>> integration point of view. Spark does have java bindings, but first, > >>> they > >>>>> are underdefined (you can check spark list for tons of postings > > about > >>>>> missing equivalent capability), and they are certainly not > >>> java-8-vetted. > >>>>> So java api in Spark for java 8 purposes, as it stands, is a moot > > point. > >>>>> > >>>>> There are also a number other goodies and clashes that exist -- use > > of > >>>>> scala collections vs. Java collections, clean functional type > > syntax, > >>>>> magic > >>>>> methods, partially defined functions, case class matchers, > > implicits, > >>> view > >>>>> and context bounds etc. Etc., all that sh$tload of acrobatics that > > comes > >>>>> actually very handy in existing implemetations and has no > > substitute in > >>>>> Java 8. > >>>>> On May 25, 2014 12:48 PM, "bandi shankar" <[email protected]> > >>> wrote: > >>>>> > >>>>>> Hi, > >>>>>> > >>>>>> I was just thinking , do we still need scala . Since in java 8 we > > have > >>>>>> all(probably) kind of feature provided by scala. > >>>>>> Since I am new to group , so just thinking why not to make mahout > > away > >>>>>> from scala. Is there any specific reason to adopt scala. > >>>>>> > >>>>>> Bandi > >>>>>> > >>>>> > >>>>> > >>>> > >>> > > > > >
