I don’t think they are wedged. The core cooccurrence and the itemsimilarity CLI are separate in the two mentioned.
Still need to do ones for I/O/CLI probably, now that there is a proposal, and RSJ. On May 29, 2014, at 3:40 PM, Ted Dunning <[email protected]> wrote: Can we separate file I/O JIRA's? That will let the core library components to be unwedged separately from getting I/O standardized. On Thu, May 29, 2014 at 3:35 PM, Pat Ferrel <[email protected]> wrote: > Agreed and in process. Sebastian’s Cooccurrence code optionally takes two > drms. > > The current CLI for itemsimilarity filters one stream for input, > optionally creating two DRMs and so does support cross-similarity. The CLI > will soon allow two input streams. The CLI for RSJ will (if I do it) take > one or two DRMs. > > Please feel free to comment on the Jiras MAHOUT-1464 (cooccurrence) and > MAHOUT-1541 (itemsimilarity CLI) > > They are maybe 80% ready, which is why a dialog over file reader/writers, > drivers, and CLI might be good. If we can move on those there are a bunch > of other jobs that can be packaged up pretty quickly from Dmitriy’s SSVD > PCA, Transpose, multiply, etc. > > On May 29, 2014, at 2:32 PM, Ted Dunning <[email protected]> wrote: > > Pat > > I would like to see the co and cross occurrence code separated out a bit > so that they take drm args. > > Sent from my iPhone > >> On May 29, 2014, at 17:58, Pat Ferrel <[email protected]> wrote: >> >> Regarding recommenders, drivers, and import/export: >> >> I’ve got Sebastian’s cooccurrence code wrapped with a driver that reads > text delimited files into a drm for use with cooccurrence. Then it writes > the indicator matrix(es) as text delimited files with user specified IDs. > It also has a proposed Driver base class, Scala based option parser and > ReadStore/WriteStore traits. The CLI will be mostly a superset of the > itemsimilarity in legacy mr. The read/write stuff is meant to be pretty > generic so I was planning to do a DB and maybe JSON example (some day). > There is still a bit of functional programming refactoring and the docs are > not up to date. >> >> With cooccurrence working we could do something that replaces all the > cooccurrence recommenders (in-memory and MR) with one codebase. Add Solr > and you have a single machine server based recommender that we can supply > with an API similar to the legacy in-memory recommender. The cool thing is > that It will scale out to a cluster with Solr and HDFS, requiring only > config changes. The downside is that it requires at least a standalone > local version of Spark to do the cooccurrence. BTW this would give us > something people have been asking for—a recommender service. >> >> Is anyone else interested in CLI, drivers, read/write in the > import/export sense? Or a new architecture for the recommenders? If so, > maybe a separate thread? >> >> On May 29, 2014, at 7:03 AM, Ted Dunning <[email protected]> wrote: >> >> Andrew, >> >> Sebastian and I were talking yesterday and guessing that you would be >> interested in this soon. Glad to know the world is as expected. >> >> Yes. This needs to happen at least at a very conceptual level. For >> instance, for classifiers, I think that we need to have something like: >> >> - progressively train against a batch of data >> questions: should this do multiple epochs? Throw an exception if >> on-line training not supported? throw an exception if too little data >> provided? >> >> - classify a batch of data >> >> - serialize a model >> >> - de-serialize a model >> >> Note that a batch listed above should be either a bunch of observations > or >> just one. >> >> Question: does this handle the following cases: >> >> - naive bayes >> - SGD trained on continuous data >> - batch trained <mumble> classifiers >> - downpour type classifier training >> >> ? >> >> >> >>> On Wed, May 28, 2014 at 6:25 PM, Andrew Palumbo <[email protected]> > wrote: >>> >>> This may be somewhat tangential to this thread, but would now be a good >>> time to start laying out some scala traits for >>> Classifiers/Clusterers/Recommenders? I am totally scala-naive, but have >>> been trying to keep up with the discussions. >>> >>> I don't know if this is premature but it seems that now that the DSL > data >>> structures have been at least sketched out if not fully implemented, it >>> would be useful to have these in place before people start porting too > much >>> over. It might be helpful in bringing in new contributions as well. >>> >>> It could also help regarding people's questions of integrating a future >>> wrapper layer. >>> >>> >>> >>>> From: [email protected] >>>> Date: Wed, 28 May 2014 17:10:43 -0700 >>>> Subject: Re: do we really need scala still >>>> To: [email protected] >>>> >>>> +1 >>>> >>>> Let's use a successful scala model as a suggestion about where to go. > It >>>> seems plausible that Java could emulate the building of a lazy DSL >>> logical >>>> plan and then poke it in plausible ways with the addition of a wrapper >>>> layer. But that only helps if the Scala layer succeeds. >>>> >>>> >>>> >>>> On Tue, May 27, 2014 at 10:56 AM, Dmitriy Lyubimov <[email protected] >>>> wrote: >>>> >>>>> Also, i think that this is leaning towards false dilemma fallacy. >>> Scala and >>>>> java models could happily exist at the same time and hopefully, > minimal >>>>> fragmentation of the project if done with precision and care. >>>>> >>>>> >>>>> On Tue, May 27, 2014 at 10:46 AM, Dmitriy Lyubimov <[email protected] >>>>>> wrote: >>>>> >>>>>> >>>>>> not sure there's much sense in taking user survey if we can't act on >>>>> this. >>>>>> In our situation, unfortunately, we don't have that many ideas to >>> choose >>>>>> from, so there's not much wiggle room imo. It is more like >>> reinforcement >>>>>> learning -- stuff that doesn't get used or supported, just dies >>> .that's >>>>> it. >>>>>> Scala bindings, though thumb up'd internally, are yet to earn this >>> status >>>>>> externally. In that sense we always have been watching for >>> use/support, >>>>>> that's why we culled out tons of stuff. Nothing changes going >>> forward (at >>>>>> least at this point). If we have tons of new ideas/contributions, >>> then it >>>>>> may be different. What is weak, dies on its own pretty evidently >>> without >>>>>> much extra effort. >>>>>> >>>>>> >>>>>>> On Tue, May 27, 2014 at 10:32 AM, Pat Ferrel <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> We are asking that anyone using Mahout as a lib or in the DSL-shell >>> to >>>>>>> learn Scala. While I still think it’s the right idea, user’s may >>>>> disagree. >>>>>>> We should probably either solicit comments or at least keep an eye >>> on >>>>>>> reactions to this. Spark took this route when the question was even >>>>> more in >>>>>>> doubt and so is at least partially supporting multiple bindings. >>>>>>> >>>>>>> Not sure how far we want to carry this but we could supply Java >>> bindings >>>>>>> to the CLI-type things pretty easily. >>>>>>> >>>>>>> >>>>>>> On May 26, 2014, at 2:43 PM, Dmitriy Lyubimov <[email protected]> >>>>> wrote: >>>>>>> >>>>>>> Well, first, functional programming in java8 is about 2-3 years >>> late to >>>>>>> the >>>>>>> scene. So the reasoning along the lines, hey, we already are using >>> tool >>>>> A, >>>>>>> and now tool B is available which is almost as good as A, so let's >>>>> migrate >>>>>>> to B, is fallible. Tool B must demonstrate not just matching >>>>> capabilities, >>>>>>> but far superb, to justify cost of such migration. >>>>>>> >>>>>>> Second, as other pointed, java 8 doesn't really match scala, not yet >>>>>>> anyway. One important feature of scala bindings work is proper >>> operator >>>>>>> overload (R-like DSL). That would not be possible to do in java 8, >>> as it >>>>>>> stands. Yes, as other pointed, it makes things concise, but most >>>>>>> importantly, it also makes things operation-centric and eliminates >>>>> nested >>>>>>> calls pile-up. >>>>>>> >>>>>>> Third, as it stands today, it would also presentn a problem from the >>>>> Spark >>>>>>> integration point of view. Spark does have java bindings, but first, >>>>> they >>>>>>> are underdefined (you can check spark list for tons of postings >>> about >>>>>>> missing equivalent capability), and they are certainly not >>>>> java-8-vetted. >>>>>>> So java api in Spark for java 8 purposes, as it stands, is a moot >>> point. >>>>>>> >>>>>>> There are also a number other goodies and clashes that exist -- use >>> of >>>>>>> scala collections vs. Java collections, clean functional type >>> syntax, >>>>>>> magic >>>>>>> methods, partially defined functions, case class matchers, >>> implicits, >>>>> view >>>>>>> and context bounds etc. Etc., all that sh$tload of acrobatics that >>> comes >>>>>>> actually very handy in existing implemetations and has no >>> substitute in >>>>>>> Java 8. >>>>>>> On May 25, 2014 12:48 PM, "bandi shankar" <[email protected]> >>>>> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I was just thinking , do we still need scala . Since in java 8 we >>> have >>>>>>>> all(probably) kind of feature provided by scala. >>>>>>>> Since I am new to group , so just thinking why not to make mahout >>> away >>>>>>>> from scala. Is there any specific reason to adopt scala. >>>>>>>> >>>>>>>> Bandi >> > >
