Can we separate file I/O JIRA's? That will let the core library components to be unwedged separately from getting I/O standardized.
On Thu, May 29, 2014 at 3:35 PM, Pat Ferrel <[email protected]> wrote: > Agreed and in process. Sebastian’s Cooccurrence code optionally takes two > drms. > > The current CLI for itemsimilarity filters one stream for input, > optionally creating two DRMs and so does support cross-similarity. The CLI > will soon allow two input streams. The CLI for RSJ will (if I do it) take > one or two DRMs. > > Please feel free to comment on the Jiras MAHOUT-1464 (cooccurrence) and > MAHOUT-1541 (itemsimilarity CLI) > > They are maybe 80% ready, which is why a dialog over file reader/writers, > drivers, and CLI might be good. If we can move on those there are a bunch > of other jobs that can be packaged up pretty quickly from Dmitriy’s SSVD > PCA, Transpose, multiply, etc. > > On May 29, 2014, at 2:32 PM, Ted Dunning <[email protected]> wrote: > > Pat > > I would like to see the co and cross occurrence code separated out a bit > so that they take drm args. > > Sent from my iPhone > > > On May 29, 2014, at 17:58, Pat Ferrel <[email protected]> wrote: > > > > Regarding recommenders, drivers, and import/export: > > > > I’ve got Sebastian’s cooccurrence code wrapped with a driver that reads > text delimited files into a drm for use with cooccurrence. Then it writes > the indicator matrix(es) as text delimited files with user specified IDs. > It also has a proposed Driver base class, Scala based option parser and > ReadStore/WriteStore traits. The CLI will be mostly a superset of the > itemsimilarity in legacy mr. The read/write stuff is meant to be pretty > generic so I was planning to do a DB and maybe JSON example (some day). > There is still a bit of functional programming refactoring and the docs are > not up to date. > > > > With cooccurrence working we could do something that replaces all the > cooccurrence recommenders (in-memory and MR) with one codebase. Add Solr > and you have a single machine server based recommender that we can supply > with an API similar to the legacy in-memory recommender. The cool thing is > that It will scale out to a cluster with Solr and HDFS, requiring only > config changes. The downside is that it requires at least a standalone > local version of Spark to do the cooccurrence. BTW this would give us > something people have been asking for—a recommender service. > > > > Is anyone else interested in CLI, drivers, read/write in the > import/export sense? Or a new architecture for the recommenders? If so, > maybe a separate thread? > > > > On May 29, 2014, at 7:03 AM, Ted Dunning <[email protected]> wrote: > > > > Andrew, > > > > Sebastian and I were talking yesterday and guessing that you would be > > interested in this soon. Glad to know the world is as expected. > > > > Yes. This needs to happen at least at a very conceptual level. For > > instance, for classifiers, I think that we need to have something like: > > > > - progressively train against a batch of data > > questions: should this do multiple epochs? Throw an exception if > > on-line training not supported? throw an exception if too little data > > provided? > > > > - classify a batch of data > > > > - serialize a model > > > > - de-serialize a model > > > > Note that a batch listed above should be either a bunch of observations > or > > just one. > > > > Question: does this handle the following cases: > > > > - naive bayes > > - SGD trained on continuous data > > - batch trained <mumble> classifiers > > - downpour type classifier training > > > > ? > > > > > > > >> On Wed, May 28, 2014 at 6:25 PM, Andrew Palumbo <[email protected]> > wrote: > >> > >> This may be somewhat tangential to this thread, but would now be a good > >> time to start laying out some scala traits for > >> Classifiers/Clusterers/Recommenders? I am totally scala-naive, but have > >> been trying to keep up with the discussions. > >> > >> I don't know if this is premature but it seems that now that the DSL > data > >> structures have been at least sketched out if not fully implemented, it > >> would be useful to have these in place before people start porting too > much > >> over. It might be helpful in bringing in new contributions as well. > >> > >> It could also help regarding people's questions of integrating a future > >> wrapper layer. > >> > >> > >> > >>> From: [email protected] > >>> Date: Wed, 28 May 2014 17:10:43 -0700 > >>> Subject: Re: do we really need scala still > >>> To: [email protected] > >>> > >>> +1 > >>> > >>> Let's use a successful scala model as a suggestion about where to go. > It > >>> seems plausible that Java could emulate the building of a lazy DSL > >> logical > >>> plan and then poke it in plausible ways with the addition of a wrapper > >>> layer. But that only helps if the Scala layer succeeds. > >>> > >>> > >>> > >>> On Tue, May 27, 2014 at 10:56 AM, Dmitriy Lyubimov <[email protected] > >>> wrote: > >>> > >>>> Also, i think that this is leaning towards false dilemma fallacy. > >> Scala and > >>>> java models could happily exist at the same time and hopefully, > minimal > >>>> fragmentation of the project if done with precision and care. > >>>> > >>>> > >>>> On Tue, May 27, 2014 at 10:46 AM, Dmitriy Lyubimov <[email protected] > >>>>> wrote: > >>>> > >>>>> > >>>>> not sure there's much sense in taking user survey if we can't act on > >>>> this. > >>>>> In our situation, unfortunately, we don't have that many ideas to > >> choose > >>>>> from, so there's not much wiggle room imo. It is more like > >> reinforcement > >>>>> learning -- stuff that doesn't get used or supported, just dies > >> .that's > >>>> it. > >>>>> Scala bindings, though thumb up'd internally, are yet to earn this > >> status > >>>>> externally. In that sense we always have been watching for > >> use/support, > >>>>> that's why we culled out tons of stuff. Nothing changes going > >> forward (at > >>>>> least at this point). If we have tons of new ideas/contributions, > >> then it > >>>>> may be different. What is weak, dies on its own pretty evidently > >> without > >>>>> much extra effort. > >>>>> > >>>>> > >>>>>> On Tue, May 27, 2014 at 10:32 AM, Pat Ferrel <[email protected]> > >>>>> wrote: > >>>>> > >>>>>> We are asking that anyone using Mahout as a lib or in the DSL-shell > >> to > >>>>>> learn Scala. While I still think it’s the right idea, user’s may > >>>> disagree. > >>>>>> We should probably either solicit comments or at least keep an eye > >> on > >>>>>> reactions to this. Spark took this route when the question was even > >>>> more in > >>>>>> doubt and so is at least partially supporting multiple bindings. > >>>>>> > >>>>>> Not sure how far we want to carry this but we could supply Java > >> bindings > >>>>>> to the CLI-type things pretty easily. > >>>>>> > >>>>>> > >>>>>> On May 26, 2014, at 2:43 PM, Dmitriy Lyubimov <[email protected]> > >>>> wrote: > >>>>>> > >>>>>> Well, first, functional programming in java8 is about 2-3 years > >> late to > >>>>>> the > >>>>>> scene. So the reasoning along the lines, hey, we already are using > >> tool > >>>> A, > >>>>>> and now tool B is available which is almost as good as A, so let's > >>>> migrate > >>>>>> to B, is fallible. Tool B must demonstrate not just matching > >>>> capabilities, > >>>>>> but far superb, to justify cost of such migration. > >>>>>> > >>>>>> Second, as other pointed, java 8 doesn't really match scala, not yet > >>>>>> anyway. One important feature of scala bindings work is proper > >> operator > >>>>>> overload (R-like DSL). That would not be possible to do in java 8, > >> as it > >>>>>> stands. Yes, as other pointed, it makes things concise, but most > >>>>>> importantly, it also makes things operation-centric and eliminates > >>>> nested > >>>>>> calls pile-up. > >>>>>> > >>>>>> Third, as it stands today, it would also presentn a problem from the > >>>> Spark > >>>>>> integration point of view. Spark does have java bindings, but first, > >>>> they > >>>>>> are underdefined (you can check spark list for tons of postings > >> about > >>>>>> missing equivalent capability), and they are certainly not > >>>> java-8-vetted. > >>>>>> So java api in Spark for java 8 purposes, as it stands, is a moot > >> point. > >>>>>> > >>>>>> There are also a number other goodies and clashes that exist -- use > >> of > >>>>>> scala collections vs. Java collections, clean functional type > >> syntax, > >>>>>> magic > >>>>>> methods, partially defined functions, case class matchers, > >> implicits, > >>>> view > >>>>>> and context bounds etc. Etc., all that sh$tload of acrobatics that > >> comes > >>>>>> actually very handy in existing implemetations and has no > >> substitute in > >>>>>> Java 8. > >>>>>> On May 25, 2014 12:48 PM, "bandi shankar" <[email protected]> > >>>> wrote: > >>>>>> > >>>>>>> Hi, > >>>>>>> > >>>>>>> I was just thinking , do we still need scala . Since in java 8 we > >> have > >>>>>>> all(probably) kind of feature provided by scala. > >>>>>>> Since I am new to group , so just thinking why not to make mahout > >> away > >>>>>>> from scala. Is there any specific reason to adopt scala. > >>>>>>> > >>>>>>> Bandi > > > >
